Strategy

The VP's Guide to AI in Operations

A practical framework for operations leaders evaluating AI agents — what actually works, what doesn't, and how to avoid the most expensive mistakes.

CorePiper TeamMarch 13, 202620 min read

Quick Answer: Operations leaders evaluating AI agents should assess four dimensions: workflow fit (does it handle multi-system orchestration or only single-platform tasks), reliability architecture (does it have retry logic, error handling, and audit trails), HITL design (can humans intervene at any step), and total cost of ownership including implementation time and customization. The most common mistake is evaluating AI agents on demo performance rather than production reliability.

You Don't Need Another AI Pitch. You Need a Framework.

You've sat through the demos. You've seen the slides with the hockey-stick graphs. You've heard "autonomous resolution" so many times the words have lost all meaning.

And yet here you are — still running a team that's 30% understaffed, processing tickets manually across three platforms, watching your best people burn out on Tier-1 work that a well-configured system should handle. The board wants AI on the roadmap. Your CFO wants cost reduction. Your team wants relief. And every vendor in your inbox is promising all three.

The problem isn't that AI doesn't work. It does — in specific, well-scoped contexts. The problem is that most of what's being sold to you isn't designed for how operations actually works. It's designed for how product marketers imagine operations works.

This guide is different. No hype. No vendor comparison matrix. Just a practical framework for evaluating AI agents in operations — built from patterns we've seen across dozens of enterprise deployments, and grounded in what the data actually says.

What AI Agents Actually Do in Operations Today

Let's start with reality. Strip away the marketing and here's what AI agents can reliably do in operations environments in 2026:

The Proven Use Cases

1. Ticket Classification and Routing AI agents can read an incoming ticket — whether it's in Salesforce, Zendesk, or Jira — understand the intent, classify it against your taxonomy, and route it to the right team. This isn't impressive technology anymore. It's table stakes. But it matters because misrouted tickets cost $15–25 each in wasted handling time, and the average enterprise operations team misroutes 12–18% of incoming volume (Gartner, 2025).

2. First-Response Automation For well-documented issue types — status inquiries, standard claims filings, tracking updates, policy questions — AI agents can generate accurate first responses that resolve the ticket without human involvement. McKinsey's 2025 research on AI in operations found that 40–60% of Tier-1 operational tickets fall into categories that can be reliably automated, provided the underlying SOPs are well-documented.

3. Data Gathering and Enrichment Before a human ever touches a complex ticket, an AI agent can pull customer history from Salesforce, check shipment status across carrier portals, retrieve relevant policy documents, and compile everything into a structured summary. This alone saves 8–12 minutes per complex ticket — time your senior ops people currently spend on copy-paste archaeology across systems.

4. SOP Execution This is where it gets interesting. The best AI agents don't just answer questions — they follow procedures. File a claim. Escalate based on dollar threshold. Apply a credit according to policy. Check three systems, cross-reference the data, and take the action your SOP prescribes. This is the difference between a chatbot and an operational agent.

5. Cross-Platform Orchestration Your operations don't live in one system. They live in Salesforce AND Jira AND Zendesk AND carrier portals AND your ERP. An AI agent that can only work in one of those systems solves maybe 20% of the problem. The agents that deliver real ROI are the ones that operate across your entire stack — reading from one system, writing to another, escalating in a third.

What AI Agents Can't Do (Yet)

Be equally clear-eyed about the limitations:

Novel problem-solving. If your team has never seen this issue before, neither has the AI. Agents follow patterns. They don't invent new ones.
Judgment calls with incomplete information. When the data is ambiguous, contradictory, or missing, you need a human. Full stop.
Relationship management. Your biggest customer is threatening to leave. That's not an AI conversation. That's a phone call from your VP.
Process design. AI agents execute processes. They don't design them. If your SOPs are broken, automating them just produces broken outcomes faster.

Deloitte's 2025 enterprise AI survey found that organizations with the highest AI ROI were the ones with the clearest boundaries around what they automated and what they didn't. The companies that tried to automate everything achieved 40% lower ROI than those that were selective.

The Evaluation Framework: Five Questions That Actually Matter

Forget the 47-point RFP template your procurement team wants to send. When you're evaluating AI agents for operations, there are exactly five questions that determine whether the investment pays off or becomes a very expensive mistake.

Question 1: How Long Until We See Value?

This is the single most important question, and it's the one most buyers get wrong.

The industry average for enterprise AI deployment is 3–6 months from contract signature to production (Forrester, 2025). Some vendors take 9–12 months. During that entire period, you're paying for the tool, paying for implementation services, and still processing every ticket manually.

Do the math. If your team processes 8,000 tickets per month at an average cost of $28 per ticket, every month of implementation delay costs you $224,000 in unrealized savings. A 6-month implementation? That's $1.34 million in value you didn't capture.

The vendors that deliver fastest are the ones that don't require you to rebuild your workflows from scratch. They ingest your existing SOPs, connect to your existing platforms, and start executing. The best deployments we've seen go live in days, not months.

What to ask vendors:

"What's the median time from contract to first automated ticket resolution — not pilot, production?"
"What do you need from us before you can start? Data exports? API access? Dedicated engineering resources?"
"Show me three customers who went live in under 30 days. Let me talk to them."

Question 2: Does It Work the Way My Team Works?

Most AI tools force you to adapt your operations to the tool. They have their own workflow engine, their own escalation logic, their own way of doing things. Your team has to learn a new system, rebuild their processes inside it, and maintain two parallel workflows during the transition.

This is backwards.

The right approach is SOP-driven AI — agents that learn your existing procedures and execute them as-is. Your team documented how to handle a freight claim? The agent follows those exact steps. Your policy says escalate to a supervisor when the claim exceeds $5,000? The agent escalates at $5,000.

This matters for two reasons. First, it dramatically reduces onboarding time — the agent is learning your processes, not the other way around. Second, it means your operations team stays in control. When policies change, you update the SOP, and the agent adapts. You don't need to hire a "prompt engineer" or file a support ticket with the vendor.

What to ask vendors:

"Can I upload our existing SOPs and have the agent follow them, or do I need to rebuild workflows in your system?"
"When our policies change, how do we update the agent's behavior? Is it self-serve or do we need your team?"
"Show me how the agent handles an edge case that's covered in our documentation but wasn't in your training data."

Question 3: What Happens When It's Wrong?

Every AI agent will make mistakes. Every single one. The question isn't whether it will be wrong — it's what happens next.

There are two architectures in the market right now:

Fully autonomous agents resolve tickets without human review. They're fast. They're efficient. And when they're wrong, the customer finds out before your team does. Klarna deployed this model and saw customer satisfaction scores drop while re-contact rates climbed. They ended up re-hiring human agents to fix what the AI broke.

Human-in-the-loop (HITL) agents flag uncertain cases for human review before taking action. They're slightly slower on those flagged cases. But they catch errors before they reach the customer. And critically, every correction teaches the agent — creating a feedback loop that makes the system more accurate over time.

The data strongly favors HITL. IBM's 2025 research found that HITL architectures achieve 94% accuracy compared to 71% for fully autonomous systems in complex operational contexts. That 23-percentage-point gap translates directly to customer experience and rework costs.

A fully autonomous agent that's wrong 29% of the time isn't saving you money — it's creating a new category of work: cleaning up after the AI. Your team is now doing the original ticket handling PLUS damage control.

What to ask vendors:

"What percentage of tickets does your agent handle fully autonomously vs. flag for human review?"
"Show me the feedback loop. When a human corrects the agent, how does that correction get incorporated?"
"What's your false positive rate? How often does the agent confidently give the wrong answer?"

Question 4: Does It Work Across My Stack?

This is the question that eliminates 80% of vendors.

Your operations run on multiple platforms. Maybe it's Salesforce for CRM, Jira for internal workflows, and Zendesk for customer-facing support. Maybe it's a different combination. But it's never just one system.

Most AI agents are built for a single platform. Zendesk AI works in Zendesk. Salesforce Agentforce works in Salesforce. Ada works in Ada's widget. They can't see across your systems. They can't correlate a customer complaint in Zendesk with an internal investigation in Jira with a carrier claim in Salesforce. They can't execute a workflow that spans all three.

This is the biggest gap in the market right now. And it's the gap that matters most for operations, because operational workflows are inherently cross-platform. A freight claim starts with a customer email (Zendesk), triggers an investigation (Jira), requires carrier communication (portal or email), and results in a credit or denial (Salesforce). An AI agent that only operates in one of those systems automates one step of a ten-step process.

What to ask vendors:

"Which platforms does your agent natively integrate with? Not through Zapier or webhooks — native, bidirectional integration."
"Show me a workflow that starts in one system and completes in another. Walk me through exactly what happens at each step."
"If I add a new tool to my stack next year, what does integration look like?"

Question 5: What's the Real Total Cost?

Not the price on the website. Not the number on the proposal. The real, all-in, year-one cost including everything.

Here's what most vendors leave out of the initial quote:

Implementation services: $50K–$200K for enterprise deployments, often required, rarely included in headline pricing
Platform prerequisites: Some tools require other products (like Salesforce Data Cloud for Agentforce) that have their own licensing costs
Per-resolution fees: These scale with success — the better the AI works, the more you pay (Zendesk's model charges $1.50–$2.00 per automated resolution)
Dedicated resources: Most enterprise AI tools require 0.5–1 FTE from your team for ongoing management, prompt engineering, and optimization
Training costs: Your team needs to learn the new system. Budget 40–80 hours of team time for initial training, plus ongoing education as the tool evolves

Gartner's 2025 analysis of enterprise AI deployments found that actual first-year costs exceeded initial estimates by an average of 2.8x. The median enterprise AI deployment cost $380,000 in year one when all costs were included — regardless of the vendor's quoted price.

What to ask vendors:

"Give me a written estimate of total first-year cost including implementation, training, platform prerequisites, and any usage-based fees at our expected volume."
"What internal resources do we need to dedicate? Be specific — hours per week, skill sets required."
"What does year two look like? What are the renewal economics?"

The ROI Framework: Making the Business Case

Your CFO doesn't care about AI. Your CFO cares about margins, headcount efficiency, and predictable costs. Here's how to build a business case that speaks their language.

Step 1: Quantify the Current State

Before you can calculate ROI, you need to know what you're spending today. Most operations leaders underestimate this by 30–40% because they don't account for the full cost chain.

Direct costs per ticket:

Agent salary ÷ tickets handled per month = direct labor cost per ticket (typically $22–35 for Tier-1)
Supervisor review time for escalated tickets: add $15–25 per escalated ticket
Quality assurance sampling: add $3–5 per ticket across the volume

Indirect costs:

Agent turnover cost: $8,000–$15,000 per departed agent (recruitment + training + ramp time). If your annual turnover is 35% — and the industry average for Tier-1 ops roles is 38% (Bureau of Labor Statistics, 2025) — this is significant.
Misrouted ticket cost: $15–25 per misrouted ticket × your misroute rate
SLA breach cost: varies by contract, but penalties of $500–$5,000 per breach are common in enterprise logistics

Opportunity costs:

Senior ops people spending 40–60% of their time on work that doesn't require their expertise
Customer churn attributable to slow response times — Zendesk's 2025 benchmark report found that 67% of customers will switch providers after just two poor support experiences

Step 2: Model the AI Impact

Be conservative. Use these ranges based on industry benchmarks:

Metric	Conservative	Moderate	Aggressive
Tier-1 automation rate	30%	45%	60%
Average handle time reduction	20%	35%	50%
Misroute reduction	40%	60%	80%
First-response time improvement	50%	70%	85%

Present the conservative case to your CFO. If the ROI works at 30% automation, it works. Period. You don't need to promise 60% to justify the investment. And under-promising creates room for you to be the hero when results exceed expectations.

Step 3: Calculate Payback Period

Payback period = Total first-year cost ÷ Monthly savings

For a typical mid-market operations team (50 agents, 10,000 tickets/month, $28 average ticket cost):

Conservative automation (30%): $84,000/month in savings
Moderate automation (45%): $126,000/month in savings

Against a total first-year cost of $150,000–$400,000 (depending on vendor), the payback period ranges from 1.2 to 4.8 months.

The math almost always works. The risk isn't in the ROI — it's in the execution. Which brings us to the most critical variable: can you actually get this thing deployed and working?

Step 4: Account for Time-to-Value

This is the killer that most business cases ignore. A tool that takes 6 months to deploy doesn't start generating returns until month 7. A tool that deploys in a week starts generating returns in week 2.

At $84,000/month in potential savings, a 5-month difference in deployment time equals $420,000 in unrealized value. That's not a rounding error. That's the difference between a successful project and a career-limiting one.

Build the time-to-value into your business case explicitly. Show the CFO two scenarios: fast deployment (under 30 days) and typical deployment (4–6 months). The delta makes the argument for you.

Red Flags: When to Walk Away

After watching dozens of AI deployments succeed and fail, certain patterns consistently predict failure. If you see these during evaluation, walk away — no matter how good the demo looked.

🚩 "We need 3–6 months to implement"

If a vendor needs half a year to get their product working, one of two things is true: either the product is genuinely complex to configure (meaning it'll be complex to maintain), or their professional services team is a profit center, not a support function. Either way, you're paying.

🚩 "Let us show you what our AI can do" (without your data)

A demo on the vendor's data tells you nothing about performance on your data. Your tickets have domain-specific terminology, your SOPs have edge cases, and your customers have expectations shaped by your industry. Demand a proof of concept on your actual data.

🚩 No clear answer on accuracy rates

If a vendor can't tell you their false positive rate, their automation accuracy, and their escalation rate — with numbers, from real customers — they either don't measure it or don't like what the numbers say.

🚩 Per-resolution pricing at scale

Per-resolution pricing creates a perverse incentive: the better the AI works, the more you pay. At 10,000 tickets per month with 50% automation and $1.75 per resolution, you're paying $105,000/year just in resolution fees — on top of your platform subscription. This model penalizes success.

🚩 Single-platform limitation

If the agent only works in one of your three core systems, you're automating a fragment of your workflow. You'll still need humans to bridge the gaps between systems, and the ROI will be a fraction of what was projected.

🚩 "You'll need a dedicated AI team"

If the vendor requires you to hire prompt engineers, AI specialists, or dedicated administrators to keep the system running, factor that cost into your evaluation. A mid-level AI/ML engineer costs $150K–$200K fully loaded. That's not a software cost — that's a headcount cost, and it's recurring.

What "Good" Looks Like

After the red flags, here's the positive pattern. The deployments that succeed — the ones that deliver real ROI and earn expansion budgets — share these characteristics:

Fast onboarding. The best deployments go from contract to production in days, not months. This isn't just about saving time — it's about maintaining organizational momentum. The longer an AI project takes to launch, the more likely it is to lose executive sponsorship, hit budget reallocation cycles, or fall victim to organizational change.

SOP-driven configuration. The agent learns your processes from your documentation. When policies change, your ops team updates the docs, not a vendor consultant. This keeps your team in control and eliminates ongoing professional services costs.

Human-in-the-loop by design. The agent handles what it can confidently handle and escalates what it can't. Every human correction improves the system. Over time, the automation rate climbs — not because you're forcing it, but because the agent has genuinely learned.

Cross-platform execution. The agent works across Salesforce, Jira, Zendesk, and whatever else you use. It doesn't just read from these systems — it writes to them, executing complete workflows that span your entire stack.

Predictable pricing. You know what you're paying in month one, and you know what you'll pay at 2x or 5x scale. No per-resolution surprises. No mandatory add-ons discovered post-contract.

This is the model that CorePiper was built on — not because it's trendy, but because it's the only model that survives contact with how enterprise operations actually works. SOP-driven agents, one-day onboarding, human-in-the-loop architecture, native cross-platform execution across Salesforce, Jira, and Zendesk. These aren't features. They're requirements.

Getting Buy-In: The Internal Playbook

Having the right tool means nothing if you can't get your organization to approve and adopt it. Here's the playbook that works.

Build the Coalition Before the Proposal

Don't surprise your CFO with an AI proposal. By the time you present formally, three people should already be nodding along:

Your CFO or Finance lead: Pre-brief them on the cost data. Show them the current per-ticket cost, the industry benchmarks, and the competitor landscape. Frame AI not as a technology initiative but as an operational efficiency initiative. Finance people fund efficiency. They're skeptical of technology.
Your IT/Engineering lead: They'll need to approve integrations and security. Bring them in early, share the vendor's technical architecture, and let them identify concerns before the formal review. Nothing kills an AI project faster than a late-stage IT veto.
A peer VP or department head: If another department has deployed AI successfully (even a different type), their endorsement carries weight. If no one in your org has done it yet, use case studies from your industry — ideally companies your executives know and respect.

Frame It as Risk Reduction, Not Innovation

Innovation budgets get cut. Risk mitigation budgets don't.

Your framing: "We're processing X thousand tickets per month with a team that's Y% understaffed. Our SLA breach rate has increased Z% quarter over quarter. Agent turnover is at W% and rising. We're one attrition event away from a service crisis. AI automation isn't a nice-to-have — it's operational risk mitigation."

This framing works because it's true. And because it positions the AI investment not as a speculative bet but as an insurance policy against a concrete, quantifiable risk.

Start Small, Prove Fast

Don't propose automating your entire operation on day one. Propose a focused pilot:

One ticket type (pick high-volume, well-documented, low-complexity)
One team (pick the team with the best SOPs and the most supportive manager)
30 days (long enough to generate data, short enough to maintain urgency)
Clear success metrics defined in advance (automation rate, accuracy, handle time reduction, team satisfaction)

A successful 30-day pilot with hard data is worth more than any vendor demo or analyst report. It's proof from your environment, your data, your team.

McKinsey's 2025 research on successful enterprise AI adoption found that organizations that started with focused pilots were 3.2x more likely to achieve full-scale deployment compared to those that attempted broad rollouts from the start.

Document Everything

Every ticket the AI handles, every correction a human makes, every minute saved — document it. You'll need this data for three things:

Justifying expansion after the pilot ("Here's what 30 days of automation produced. Here's what full deployment would produce.")
Defending the budget in the next cycle ("Here's the measurable ROI from Q1. Cutting this investment would cost us $X per month.")
Building the internal case study that makes you the person who brought AI to operations successfully. That's a career-making outcome.

The Bottom Line

AI in operations isn't a question of if anymore. It's a question of how — and how well.

The vendors that will win your trust are the ones that deploy fast, work the way your team works, keep humans in control, operate across your entire stack, and charge you a predictable price for doing it. Everything else is noise.

The framework in this guide isn't theoretical. It's the distillation of what works — built from real deployments, real failures, and real data. Use it to cut through the pitches, ask the right questions, and make a decision you'll be proud of in 12 months.

Your team is drowning in manual work. Your board wants AI on the roadmap. The tools exist to solve both problems today. The only question is whether you'll choose the right one.

Ready to Lead With AI?

See how operations leaders are deploying AI agents that learn, adapt, and deliver measurable ROI.

You Don't Need Another AI Pitch. You Need a Framework.

What AI Agents Actually Do in Operations Today

The Proven Use Cases

What AI Agents Can't Do (Yet)

The Evaluation Framework: Five Questions That Actually Matter

Question 1: How Long Until We See Value?

Question 2: Does It Work the Way My Team Works?

Question 3: What Happens When It's Wrong?

Question 4: Does It Work Across My Stack?

Question 5: What's the Real Total Cost?

The ROI Framework: Making the Business Case

Step 1: Quantify the Current State

Step 2: Model the AI Impact

Step 3: Calculate Payback Period

Step 4: Account for Time-to-Value

Red Flags: When to Walk Away

🚩 "We need 3–6 months to implement"

🚩 "Let us show you what our AI can do" (without your data)

🚩 No clear answer on accuracy rates

🚩 Per-resolution pricing at scale

🚩 Single-platform limitation

🚩 "You'll need a dedicated AI team"

What "Good" Looks Like

Getting Buy-In: The Internal Playbook

Build the Coalition Before the Proposal

Frame It as Risk Reduction, Not Innovation

Start Small, Prove Fast

Document Everything

The Bottom Line

Further Reading

Ready to Lead With AI?