ServicesProcessWorkChatAboutPricingBlogBook a call
Engineering
Jan 19, 2026 · 8 min read

Telemetry for agents: what we log and why

agent-metrics · production
LIVE
MetricProgressCurrentTarget
01Latency p95
2.3s
<3s
02Error Rate
1.2%
<2%
03Token Usage
450
<500
04Cost / Run
$0.15
<$0.20
05Success Rate
98.5%
>95%
5 / 5 within target
30-day window · May 2026
By the studio
Agnotiq Studio

Autonomous agents are only as trustworthy as the window you have into what they're doing. At Agnotiq, telemetry is not an afterthought — it's the first thing we wire before handing an agent any real work.

This post walks through the shape of our trace schema, the five dashboard metrics we watch daily, and — equally important — the alerts we deliberately skip so your team doesn't drown in noise.

The shape of our trace schema

Every agent run produces an OpenTelemetry-compatible JSON trace. The document captures the full journey from first prompt to final status, so any operator — engineer or business owner — can replay exactly what happened and why.

Trace document — key fields
trace_id / span_id"a3f7c8d1-…"

Links every action in a single agent run into one queryable timeline.

attributes{ model, prompt, max_iter }

Input prompt, model name (e.g. claude-sonnet-4-6), and iteration cap.

events[ ]LLM step · tool call · approval

Every thinking step, tool invocation, and human-in-the-loop gate.

status"ok" | "error"

Terminal status — the single field that drives your P95 success-rate metric.

All four fields travel together under a single trace_id
Schema structure used across every Agnotiq agent deployment

The key insight in our schema design: events are append-only. We never overwrite a step. That means a post-mortem can walk the entire decision tree as it actually unfolded, not a reconstructed approximation.

Dashboards that drive decisions

We track five core metrics on always-on dashboards. Everything else is derived from these — there is no “maybe useful later” column on our boards.

Core dashboard metrics — current vs target
Latency p95
2.3starget <3s

Faster agents = happier customers

Error Rate
1.2%target <2%

Minimize manual fixes

Token Usage
450target <500

Controls AI compute costs

Cost / Run
$0.15target <$0.20

Predictable monthly spend

Success Rate
98.5%target >95%

Reliable automation ROI

Metrics sampled from a live production deployment, May 2026
98.5%
Success rate

The success-rate number looks great on its face. What makes it trustworthy is the 1.5 % that failed: we know exactly which tool call timed out, which approval sat in queue too long, and what prompt caused the off-rail. That failure visibility is how you defend a 98.5 % figure to a customer without crossing your fingers.

Agent run success rate · 30-day window
98.5%success
98.5%Succeeded
1.5%Failed

Every failure was traced to a specific tool timeout or approval delay — no invisible errors.

Agent run success rate across a 30-day production window

Alerts we don't bother with

Alert fatigue is a real cost. When everything pages, nothing gets fixed — the team learns to ignore the channel. We made an explicit decision early on: no alert fires unless a human can take a meaningful action within the hour.

We skip
Minor hallucinationsAuto-retry handles 95 % of these without human touch.
Latency blips < 5 sNormal cloud variance — not a business signal.
Rare low-impact errors< 1 % volume; not worth the noise on your Slack channel.
We fire on
Cost spike > 20 %
Approval backlog > 2 h
Unlogged PII / compliance gap

These three hit your P&L directly. Everything else gets logged and reviewed in the weekly retro, not in Slack at 2 am.

Alert policy — every Agnotiq deployment ships with this default tier

The three things that fire — cost spikes, approval backlogs, and compliance gaps — share one property: a human decision in the next hour changes the outcome. Minor LLM quirks and cloud latency bumps don't meet that bar, so they don't get a page.

What this gives your business

The trace schema and dashboard are not engineering vanity metrics. They are the artifact your operations team reads when something feels off, and the evidence your finance team reaches for when they want to justify the line item. Properly structured telemetry typically lets Agnotiq customers cut AI compute costs 30–50 % within the first quarter — not because the agents change, but because the visibility exposes waste that was invisible before.

30–50%
Cost reduction
typical first-quarter improvement
< 1 h
Mean time to insight
from incident to root cause in traces
Ops confidence
teams that can see it, trust it

Bottom line

Telemetry is not tech debt you pay off later. It is the control panel you need from day one. With the right schema, five focused metrics, and an alert policy built around business impact rather than engineering noise, your agents stop being a black box and start being something you can defend, optimize, and scale with confidence.

If you'd like to walk through how we'd instrument your specific workflow, drop us a line at hello@agnotiq.com. The logs already know the answer.

Let's build

Have a workflow that deserves an agent?

Tell us what's eating your team's afternoons. We'll come back inside three days with a discovery plan, a price, and the names of the engineers we'd put on it.