74% of Enterprises Rolled Back Their AI Agents. Here's What They Did Wrong.
74% of enterprises rolled back their AI customer agents after launch. The model isn't broken — three buyer-side mistakes are. Here's the build that survives.
74% of enterprises that put an AI customer agent into production have already rolled it back or shut it down. That's the headline from Sinch's AI Production Paradox report, published May 13[1]. Among companies with the most mature governance, the number climbs to 81%[2].
Every LinkedIn post this week is treating that stat like proof AI agents don't work. Most of them are wrong. The 74% isn't the failure of agents — it's the failure of how operators bought them, scoped them, and shipped them. I've been wiring agents for a year. The pattern is boring, repeatable, and almost entirely on the buyer's side.
Here's what actually broke, what didn't, and how I'd build the next one without ending up in that 74%.
What the Sinch study actually says
The study surveyed 2,527 senior decision-makers across 10 countries[1]. The headline number — 74% rollback — is specifically about AI customer communications agents, deployed and then pulled. Not paused. Not "iterated on." Pulled.
The Register picked it up the same day and quoted Sinch's own framing: the rollbacks were "tied to governance failures"[3]. That's the part everyone is skipping past. The agents didn't fail at conversation. They failed at being supervised. There's a difference.
Sinch also flagged a second number that's quietly more interesting: 98% of those same enterprises are still increasing AI investment in 2026[4]. The companies pulling agents aren't giving up on agents — they're rebuilding the foundation underneath them. That's a very different story from "AI doesn't work."
For context, IBM's just-released CTO survey (2,000 C-level execs) found only 11% feel completely prepared for the scale of agent operations they're committing to[5]. Two-thirds say they're being held responsible for AI systems they can't realistically supervise. That's the cliff.
Why most takes are wrong
The LinkedIn version of this story is: "see, AI agents are overhyped, told you so." It's wrong for three reasons.
First, the 74% is a customer-service-agent number. Customer-facing agents are the hardest category to deploy because every output is public, every failure becomes a screenshot, and every mistake costs a chargeback or a regulatory letter. Internal agents — the ones that triage tickets, summarize calls, prep meeting briefs — quietly work fine. The same Sinch report shows 96% of enterprises now have some agent in operation[6]. The boring ones don't make headlines.
Second, "rolled back" doesn't mean "shut down forever." In nearly every public post-mortem I've read, the rollback is followed by a smaller, scoped re-launch six to eight weeks later — with the human-in-the-loop kept on. Forrester data referenced in the Lumichats analysis shows 74% of successful production deployments kept explicit human checkpoints for the first 60-90 days of real operation[7]. The failed deployments removed humans on day one. That's the whole story.
Third, "governance failure" is doing a lot of work in that headline. Sinch defines it as anything from the agent acting outside policy to the agent making a customer commitment the company couldn't honor. In practice, almost every governance failure I've traced traces back to one of three buyer-side mistakes — none of which are about the model.
The three mistakes I keep seeing
I've spent the last twelve months looking at operator AI agent attempts that worked and ones that didn't. The split is stark. Here are the three failure modes that show up in nearly every rollback:
1. Wrong scope on day one
Operators buy "an AI customer service agent" the way they used to buy a SaaS seat — assume it does the whole job, assume it scales linearly, assume it replaces a person. None of those are true on day one.
Atlan's analysis put it cleanly: most enterprise agent hallucinations aren't model failures — they're context failures, where the agent acts on data it can't verify and business rules it was never given[8]. If you launch an agent against your full ticket queue, you get hallucinations against every edge case your team learned over years. If you launch it against your top 3 ticket types — password resets, order status, return labels — you get a clean win in week one.
The buyers ending up in the 74% are the ones who skipped scoping. The 26% who survived shipped narrow, then expanded.
2. No observability before launch
Fiddler's analysis of agent failure rates pegs the production failure rate at 70-95% depending on category[9]. The kicker is why — failures compound across reasoning, tool calls, context limits, and cost. None of that is visible without instrumentation.
I've watched operators ship agents with zero logging beyond the OpenAI dashboard. When something breaks at 2 AM, they have no idea what the agent did, what it called, or what it told the customer. By the time they figure it out, the screenshot is on Twitter. Then they pull the plug.
The fix is unglamorous: every agent action gets logged, every tool call gets a trace, every customer-facing message gets reviewed daily for the first month. If you can't see what the agent did, you can't trust what it'll do tomorrow.
3. Removing the human too early
This is the single biggest failure mode and it's almost entirely a buyer pressure problem. The board wants headcount savings. The CFO wants ROI in Q1. The vendor sold "autonomous." So the operator pulls the human reviewer in week two and the agent goes live solo.
Then the agent commits to a refund it can't process. Or it tells a customer their flight is non-existent. Or it hallucinates a policy. And the rollback hits 90 days later.
Lumichats' analysis cites Forrester and McKinsey data showing the successful deployments — the 26% that didn't roll back — used 60 to 90 days of human-in-the-loop operation before removing oversight[7]. The failed ones averaged less than two weeks.
That's the entire delta. Two weeks of patience versus three months of patience. Everything else — the model, the prompt, the vendor — is downstream.
What I'd build instead
If you're an operator running a business between $1M and $20M and you want an AI agent in production by end of quarter, here's the build I'd put in:
Pick one ticket type, not a queue. Password resets. Order status. Booking confirmations. One workflow your team handles 200 times a week the same way. That's the agent's entire job. Not "customer service." One job.
Ship with the human kept. First 30 days, every agent message gets reviewed before it goes out. Cheap with a Slack channel. Catches 90% of edge cases before they go public. Week 4, the human reviews after-the-fact instead of before. Week 8, sampling instead of every message. Week 12, you can talk about going autonomous — maybe.
Log everything from day one. Every prompt, every tool call, every response, with timestamps and customer ID. If you can't replay a conversation end-to-end, don't ship the agent. The data costs nothing. The lack of it costs a rollback.
Set a kill switch. A single human-readable rule — "if the agent talks about pricing, refunds, or anything legal, escalate immediately" — wired in before the agent ever sees a customer. The companies in the 74% didn't have this. The ones who survived did.
That's the build. Boring. Slow. It ships in 30 days, not 30 minutes. It survives.
The signal in the noise
The 74% rollback is the story everyone is going to spend the next month arguing about. The actual signal underneath it — 98% of those same companies still increasing AI investment[4] — is the one that matters for operators.
The agents aren't going away. The vendors selling "autonomous from day one" are. The operators left standing at the end of 2026 are the ones who scoped narrow, kept humans on, logged everything, and gave themselves 90 days instead of two weeks.
That's the whole game.
If you're trying to figure out what an actual production-ready agent looks like for your stack — what's worth building, what to outsource, what to wait out — that's what the audit call is for. 30 minutes, no pitch, I'll tell you what your version would look like and what would put you in the 74%.
-
Sinch research reveals 74% of enterprises have rolled back live AI customer communications agents↩
Sinch's AI Production Paradox study found 74% of enterprises rolled back AI customer agents after deployment
-
Sinch survey finds 74% of firms rolled back AI agents↩
Rollback rate rises to 81% among organisations with the most mature governance frameworks
-
Dissatisfied: Three-fourths of AI customer service rollouts are a letdown↩
Independent coverage tying the rollbacks to governance failures; 81% rollback among firms with mature guardrails
-
Sinch releases AI Production Paradox report↩
98% of enterprises report increasing investment in AI communications in 2026
-
CIOs and CTOs are making high-stakes decisions with incomplete information, IBM survey reveals↩
IBM's 2000-CTO survey found only 11% feel completely prepared for the scale of agent operations
-
1,600 AI Agents Per Enterprise: The Governance Gap↩
96% of enterprises now report some level of agent adoption per Salesforce 2026 data
-
What 'Human-in-the-Loop' Actually Means in Production↩
Successful deployments kept human-in-the-loop monitoring for the first 90 days of real operation
-
AI Agent Hallucination: Causes, Risks & Context Solutions↩
Enterprise agent hallucinations are context failures, not model failures
-
AI Agent Failure Rate: Why 70-95% Fail in Production↩
Production failure rates of 70-95% driven by compounding errors across reasoning, tools, context, and cost
Ready to build your own AI system?
Book a Free Audit Call →Keep Reading
Anthropic Let an AI Run a Shop. It Hired a Blue Blazer.
Anthropic put Claude in charge of a real office shop. It lost money and claimed it would deliver in a blue blazer. Here's what that means for you.
Shopify Just Opted 5.6M Stores Into AI Shopping. Most Don't Even Know.
Shopify auto-enrolled 5.6M stores into ChatGPT, Copilot, and Google AI Mode on March 24. AI traffic now converts 42% better. Most operators have no idea.
Per-Seat SaaS Pricing Is Dying. Here's What's Replacing It.
Atlassian's first seat-count decline. $285B in SaaS market cap gone. Here's how AI agents are repricing your stack — and the double-charge trap most vendors are setting at renewal.