About one in five of our prototypes never makes it to production. That's not a bug — it's the system working as designed.
We treat retirement as a signal, not a failure. When we shut an agent down, we do a short debrief: what were we trying to solve, where did the signal break down, and what does that tell us about the next one? After 2025, we had enough of these to see the patterns clearly.
Across all of our 2025 prototypes, one in five was retired before it reached a production rollout. Each of those had a working agent — the gap was always in the surrounding context: the problem fit, the integration surface, or the workflow clarity.
Why we said goodbye
We analyzed every retired prototype from 2025 and found that failures didn't cluster around technical capability. The models worked. The pipelines ran. What failed was always one of three things.
The agent solved something real — just not something painful. No one noticed when it was down.
Every workflow touch required a manual handoff or a brittle API glue. The overhead outweighed the automation.
The agent handled the textbook case well. It broke on the 30% of exceptions that define real SMB workflows.
1. The “solution in search of a problem”
These were technically strong agents. Accurate classifications, reliable tool calls, clean outputs. But when we measured what happened when they went offline for a day, the answer was: nothing. No one noticed, no ticket was filed, no one asked when it would be back.
An agent is only as valuable as the time or cost it removes from a daily workflow. If the workflow can absorb its absence, it was solving a nice-to-have, not a bottleneck. We learned to validate that gap before building — not after.
2. The integration friction trap
Several prototypes required manual inputs at two or three points in the workflow. The agent saved time inside a single step but added overhead around it. The net result was roughly zero — or occasionally negative, because the manual steps now involved waiting for an AI response.
A net gain of three minutes per run was not enough to justify the integration cost or the cognitive overhead of trusting a partial automation. Partial automation can be worse than none.
3. The contextual ambiguity gap
These were the most frustrating retirements. The agent performed well on the documented process — the 70% that looks like the manual. But SMB workflows live in the exceptions: the customer who is also a reseller, the invoice that spans two billing periods, the onboarding that forked three weeks ago for a custom deal.
When an agent breaks trust on the 30% of edge cases, it creates more work than it saves. Users stop trusting the output, start re-checking everything, and eventually route around the agent entirely. The exception is not a niche scenario — it's the most important one.
Solved something real — just not something painful. Nobody noticed when it went down.
Every touch point required a manual hand-off. The overhead outweighed the automation.
Broke on the 30% of exceptions that define real SMB workflows. Trust erodes fast.
What changed in how we build
These retirements reshaped our discovery and prototype phases in three concrete ways. None of them require a new tool or a bigger model — they're process changes that front-load the hard questions.
Retirement is the system working
One of the questions we get most often is: “What's your success rate?” The honest answer is that 80% of our prototypes reach production — and we think a 100% rate would be a red flag.
An org that never retires an agent is either not exploring the edges or not being honest about what “working” means. The 20% we shut down buys us better intuition on the 80% we ship. The retirements are not waste — they are the research.
Ship the agents that earn their keep. Retire the ones that don't — quickly, without embarrassment, and with a clear note about why. That note is the most valuable document the project produces.
The retirement log is the real learning.