Engineering

Mar 30, 2026 · 6 min read

Routing between frontier and open models without losing sleep

model-router · dispatch-log

LIVE

REQTask previewModel routedCost

a3f8c“What are your office hours?”

haiku-4.5

$0.0003

b7c2e“Draft a Q3 strategy proposal”

opus-4.7

$0.042

d1e9a“Classify this support ticket”

haiku-4.5

$0.0001

f5a30“Summarise last week's meeting”

haiku-4.5

$0.0004

k2m1f“Analyse competitor pricing data”

opus-4.7

$0.068

p9q40“Extract contact from intake form”

haiku-4.5

$0.0002

r7s0b“Are these SLA terms standard?”

haiku-4.5

$0.0003

m3n8d“Translate FAQ section to Spanish”

haiku-4.5

$0.0002

75% small model

25% frontier

avg $0.014 / req~6× cheaper than all-frontier

By the studio

Agnotiq Studio

Most businesses don't care which model is answering the question. They care that it's fast, accurate, and doesn't bankrupt them.

Yet behind the scenes, you're probably juggling frontier models GPT-5-class, Claude, Gemini, and open-source or smaller alternatives. The trick is a small, boring piece of plumbing that quietly decides: “Which model gets this call?” That's all routing is. And it's quietly one of the biggest levers on your AI cost, quality, and long-term flexibility.

Why this matters for small and midsize businesses

For SMBs, every dollar of AI spend is visible. You're not an enterprise with a “whatever it costs” AI budget. Frontier models are powerful but expensive for routine work. Open and smaller models are often good enough for 80-90% of tasks and cost a fraction.

If you route every request through a frontier model, you're paying for a sports car when you need a station wagon.

Where your requests actually go

Routine tasks (FAQs, triage, extraction)80-90%

→ Small / open model, fraction of the cost

Complex reasoning, drafting, strategy10-20%

→ Frontier model, pay for what you actually need

A typical SMB request mix once routing is in place

The real cost of naïve AI wiring

Most teams start with something like: “All AI requests go to the frontier model, it's the best.” This feels responsible. It isn't.

Signs you're over-routing to frontier models

Opaque cost spikes at end of month with no clear culprit
Simple FAQ responses taking 2-3 seconds when they could be instant
Paying for reasoning headroom you never actually use
Every retry and edge case runs on the most expensive tier

Routing fixes this at the architecture level, not the spreadsheet level. Once it's in place, you stop debugging AI costs and start predicting them.

What “routing” actually means

Think of routing as a traffic cop sitting between your users and your models. A simple request, “What are your office hours?” goes to a small, cheap model. A complex one, “Draft a proposal for a 3-month pilot with 12 stakeholders”, goes to the frontier model. You're not forcing everything through one model; you're sending the right task to the right model at the right price.

How to route intelligently (without overengineering)

You don't need FinTech-grade AI orchestration software. For SMBs, good routing looks like three filters applied in order.

Routing decision, three filters in order

01

Task complexity

Simple / repeatable → small modelComplex reasoning → frontier

02

Sensitivity & privacy

Internal / sensitive → self-hostedPublic-facing → frontier API

03

Cost ceiling

Set a max cost per call (e.g. $0.005). If the smaller model delivers, it wins. Only escalate when quality requires it.

Apply these filters in sequence, most requests never reach filter 3

Where routing fits in your stack

The routing layer is thin by design. It sits between your application and the model pool, invisible to your users, inexpensive to maintain, and easy to tune as your usage patterns shift.

Your AI stack

User request

Any channel, chat, API, form

Routing layer

Classifies, decides, dispatches

Model pool

Frontier · Open · Self-hosted

Response

Fast, consistent, cost-controlled

The routing layer is the only new piece, everything else already exists

How to start with routing today

You don't need to build everything from scratch. Start with two or three clear task buckets, pick one frontier model for the heavy-lifting bucket, pick one cheaper model for the rest, and route via a simple conditional in your code or workflow engine.

01

Define 2-3 task buckets

FAQs and triage / drafting and strategy / internal and sensitive. Three buckets covers most SMB use cases.

02

Pick one frontier model

For the bucket that genuinely needs it. Don't debate which one, pick the one your team already knows.

03

Pick one or two cheaper models

Open weights or a smaller API model. Benchmark against your real tasks, not synthetic benchmarks.

04

Route via a simple conditional

A function that looks at task type and cost ceiling and returns a model ID. Ship it. Tune later.

Eventually, routing becomes a silent, revenue-protecting component of your platform, like a well-tuned HVAC system: nobody thinks about it until it stops working.

How Agnotiq approaches this

We pre-wire routing into every workflow we build. Frontier models handle complex orchestration and reasoning; smaller, open models handle repetitive, high-volume tasks. You get the best of both worlds without the cost of using frontier models for everything, and without spending a week debugging why your AI bill doubled.

Route the request. Pay for what you need.

← Back to all field notes Talk to the team

Keep reading

Studio · May 2, 2026

What we read in 2026

Six papers, blogs, and books the Agnotiq studio kept returning to, all focused on agentic AI for the work that actually slows SMBs down.

By the studioRead

Field notes · Apr 18, 2026

Why we write the eval suite before we write the agent

On the discipline of starting from a scoring function, and the awkward first week where the prototype scores 31%.

By the studioRead

Case study · Mar 12, 2026

What 3.8M conversations taught us about ticket triage

The tickets that look easy are the ones that bite. Notes from a year of production triage.

By the studioRead

Let's build

Have a workflow that deserves an agent?

Tell us what's eating your team's afternoons. We'll come back inside three days with a discovery plan, a price, and the names of the engineers we'd put on it.

Start a project hello@agnotiq.com