Engineering

Jan 19, 2026 · 8 min read

Telemetry for agents: what we log and why

agent-metrics · production

LIVE

MetricProgressCurrentTarget

01Latency p95

2.3s

<3s✓

02Error Rate

1.2%

<2%✓

03Token Usage

450

<500✓

04Cost / Run

$0.15

<$0.20✓

05Success Rate

98.5%

>95%✓

5 / 5 within target

30-day window · May 2026

By the studio

Agnotiq Studio

Autonomous agents are only as trustworthy as the window you have into what they're doing. At Agnotiq, telemetry is not an afterthought it's the first thing we wire before handing an agent any real work.

This post walks through the shape of our trace schema, the five dashboard metrics we watch daily, and, equally important, the alerts we deliberately skip so your team doesn't drown in noise.

The shape of our trace schema

Every agent run produces an OpenTelemetry-compatible JSON trace. The document captures the full journey from first prompt to final status, so any operator, engineer or business owner, can replay exactly what happened and why.

Trace document, key fields

trace_id / span_id"a3f7c8d1-…"

Links every action in a single agent run into one queryable timeline.

attributes{ model, prompt, max_iter }

Input prompt, model name (e.g. claude-sonnet-4-6), and iteration cap.

events[ ]LLM step · tool call · approval

Every thinking step, tool invocation, and human-in-the-loop gate.

status"ok" | "error"

Terminal status, the single field that drives your P95 success-rate metric.

All four fields travel together under a single trace_id

Schema structure used across every Agnotiq agent deployment

The key insight in our schema design: events are append-only. We never overwrite a step. That means a post-mortem can walk the entire decision tree as it actually unfolded, not a reconstructed approximation.

Dashboards that drive decisions

We track five core metrics on always-on dashboards. Everything else is derived from these, there is no “maybe useful later” column on our boards.

Core dashboard metrics, current vs target

Latency p95

2.3starget <3s

Faster agents = happier customers

Error Rate

1.2%target <2%

Minimize manual fixes

Token Usage

450target <500

Controls AI compute costs

Cost / Run

$0.15target <$0.20

Predictable monthly spend

Success Rate

98.5%target >95%

Reliable automation ROI

Metrics sampled from a live production deployment, May 2026

98.5%

Success rate

The success-rate number looks great on its face. What makes it trustworthy is the 1.5 % that failed: we know exactly which tool call timed out, which approval sat in queue too long, and what prompt caused the off-rail. That failure visibility is how you defend a 98.5 % figure to a customer without crossing your fingers.

Agent run success rate · 30-day window

98.5%success

98.5%Succeeded

1.5%Failed

Every failure was traced to a specific tool timeout or approval delay, no invisible errors.

Agent run success rate across a 30-day production window

Alerts we don't bother with

Alert fatigue is a real cost. When everything pages, nothing gets fixed, the team learns to ignore the channel. We made an explicit decision early on: no alert fires unless a human can take a meaningful action within the hour.

We skip

Minor hallucinationsAuto-retry handles 95 % of these without human touch.

Latency blips < 5 sNormal cloud variance, not a business signal.

Rare low-impact errors< 1 % volume; not worth the noise on your Slack channel.

We fire on

₊Cost spike > 20 %

⏱Approval backlog > 2 h

⚑Unlogged PII / compliance gap

These three hit your P&L directly. Everything else gets logged and reviewed in the weekly retro, not in Slack at 2 am.

Alert policy, every Agnotiq deployment ships with this default tier

The three things that fire, cost spikes, approval backlogs, and compliance gaps, share one property: a human decision in the next hour changes the outcome. Minor LLM quirks and cloud latency bumps don't meet that bar, so they don't get a page.

What this gives your business

The trace schema and dashboard are not engineering vanity metrics. They are the artifact your operations team reads when something feels off, and the evidence your finance team reaches for when they want to justify the line item. Properly structured telemetry typically lets Agnotiq customers cut AI compute costs 30-50 % within the first quarter not because the agents change, but because the visibility exposes waste that was invisible before.

30-50%

Cost reduction

typical first-quarter improvement

< 1 h

Mean time to insight

from incident to root cause in traces

2×

Ops confidence

teams that can see it, trust it

Bottom line

Telemetry is not tech debt you pay off later. It is the control panel you need from day one. With the right schema, five focused metrics, and an alert policy built around business impact rather than engineering noise, your agents stop being a black box and start being something you can defend, optimize, and scale with confidence.

If you'd like to walk through how we'd instrument your specific workflow, drop us a line at hello@agnotiq.com. The logs already know the answer.

← Back to all field notes Talk to the team

Keep reading

Studio · May 2, 2026

What we read in 2026

Six papers, blogs, and books the Agnotiq studio kept returning to, all focused on agentic AI for the work that actually slows SMBs down.

By the studioRead

Field notes · Apr 18, 2026

Why we write the eval suite before we write the agent

On the discipline of starting from a scoring function, and the awkward first week where the prototype scores 31%.

By the studioRead

Engineering · Mar 30, 2026

Routing between frontier and open models without losing sleep

A small piece of plumbing that decides which model gets which call. Saves money, ages well, doesn't get cute.

By the studioRead

Let's build

Have a workflow that deserves an agent?

Tell us what's eating your team's afternoons. We'll come back inside three days with a discovery plan, a price, and the names of the engineers we'd put on it.

Start a project hello@agnotiq.com