Field notes

Feb 24, 2026 · 5 min read

The agents we retired in 2025 (and what we learned)

agent-cohort · 2025 retrospective

10 prototypes

CLOSED

AGT-01

✓PROD

AGT-02

✓PROD

AGT-03

✓PROD

AGT-04

✓PROD

AGT-05

✓PROD

AGT-06

✓PROD

AGT-07

✓PROD

AGT-08

✓PROD

AGT-09

×RETIRED

AGT-10

×RETIRED

Production, 8 agents

80%

Avg 6 weeks to ship

7 of 8 still running

Outcome validated pre-build

Retired, 2 agents

20%

AGT-09, Misaligned problem

AGT-10, Integration friction

Each produced a debrief note

A 100% rate would be a red flag

Retirements are the research

By the studio

Agnotiq Studio

About one in five of our prototypes never makes it to production. That's not a bug, it's the system working as designed.

We treat retirement as a signal, not a failure. When we shut an agent down, we do a short debrief: what were we trying to solve, where did the signal break down, and what does that tell us about the next one? After 2025, we had enough of these to see the patterns clearly.

20%

Prototype retirement rate

Across all of our 2025 prototypes, one in five was retired before it reached a production rollout. Each of those had a working agent, the gap was always in the surrounding context: the problem fit, the integration surface, or the workflow clarity.

Why we said goodbye

We analyzed every retired prototype from 2025 and found that failures didn't cluster around technical capability. The models worked. The pipelines ran. What failed was always one of three things.

The three retirement patterns

01

Misaligned problem

The agent solved something real, just not something painful. No one noticed when it was down.

02

Integration friction

Every workflow touch required a manual handoff or a brittle API glue. The overhead outweighed the automation.

03

Contextual gap

The agent handled the textbook case well. It broke on the 30% of exceptions that define real SMB workflows.

Every 2025 retirement mapped to one of these three root causes

1. The “solution in search of a problem”

These were technically strong agents. Accurate classifications, reliable tool calls, clean outputs. But when we measured what happened when they went offline for a day, the answer was: nothing. No one noticed, no ticket was filed, no one asked when it would be back.

An agent is only as valuable as the time or cost it removes from a daily workflow. If the workflow can absorb its absence, it was solving a nice-to-have, not a bottleneck. We learned to validate that gap before building, not after.

2. The integration friction trap

Several prototypes required manual inputs at two or three points in the workflow. The agent saved time inside a single step but added overhead around it. The net result was roughly zero, or occasionally negative, because the manual steps now involved waiting for an AI response.

The friction math

Time saved per run

12 min

Inside the automated step, the agent was fast and accurate.

Overhead added per run

9 min

Manual hand-offs, context re-entry, and wait time between steps.

A net gain of three minutes per run was not enough to justify the integration cost or the cognitive overhead of trusting a partial automation. Partial automation can be worse than none.

3. The contextual ambiguity gap

These were the most frustrating retirements. The agent performed well on the documented process, the 70% that looks like the manual. But SMB workflows live in the exceptions: the customer who is also a reseller, the invoice that spans two billing periods, the onboarding that forked three weeks ago for a custom deal.

When an agent breaks trust on the 30% of edge cases, it creates more work than it saves. Users stop trusting the output, start re-checking everything, and eventually route around the agent entirely. The exception is not a niche scenario, it's the most important one.

root-cause · agent-retirement · 2025

3 patterns

01

Misaligned Problem

Solved something real, just not something painful. Nobody noticed when it went down.

AGT-09

02

Integration Friction

Every touch point required a manual hand-off. The overhead outweighed the automation.

AGT-10

03

Contextual Gap

Broke on the 30% of exceptions that define real SMB workflows. Trust erodes fast.

Observed pattern

Every 2025 retirement mapped to one of these

2 confirmed · 1 pattern

What changed in how we build

These retirements reshaped our discovery and prototype phases in three concrete ways. None of them require a new tool or a bigger model they're process changes that front-load the hard questions.

01

Outcome-first scoping

We will not start a prototype without a single sentence that names the exact business result: "reduces invoice query handling time by 40%" or "eliminates manual re-entry between CRM and billing." If we cannot write that sentence, the discovery is not done.

02

Integration smoke tests in week one

Before we write a single prompt, we test every system boundary the agent will touch. If an API is rate-limited, fragile, or undocumented, we know about the friction before we have built anything that depends on it.

03

Exception cataloguing before build

We now spend a session collecting edge cases with the client before prototyping starts. The goal is to find the 30% that breaks the model, and decide upfront whether the agent should handle those cases, hand off, or stay in its lane.

Retirement is the system working

One of the questions we get most often is: “What's your success rate?” The honest answer is that 80% of our prototypes reach production, and we think a 100% rate would be a red flag.

An org that never retires an agent is either not exploring the edges or not being honest about what “working” means. The 20% we shut down buys us better intuition on the 80% we ship. The retirements are not waste, they are the research.

The principle

Ship the agents that earn their keep. Retire the ones that don't, quickly, without embarrassment, and with a clear note about why. That note is the most valuable document the project produces.

The retirement log is the real learning.

← Back to all field notes Talk to the team

Keep reading

Studio · May 2, 2026

What we read in 2026

Six papers, blogs, and books the Agnotiq studio kept returning to, all focused on agentic AI for the work that actually slows SMBs down.

By the studioRead

Field notes · Apr 18, 2026

Why we write the eval suite before we write the agent

On the discipline of starting from a scoring function, and the awkward first week where the prototype scores 31%.

By the studioRead

Engineering · Mar 30, 2026

Routing between frontier and open models without losing sleep

A small piece of plumbing that decides which model gets which call. Saves money, ages well, doesn't get cute.

By the studioRead

Let's build

Have a workflow that deserves an agent?

Tell us what's eating your team's afternoons. We'll come back inside three days with a discovery plan, a price, and the names of the engineers we'd put on it.

Start a project hello@agnotiq.com