ServicesProcessWorkChatAboutPricingBlogBook a call
Engineering
Mar 30, 2026 · 6 min read

Routing between frontier and open models without losing sleep

model-router · dispatch-log
LIVE
Task previewModel routed
What are your office hours?
haiku-4.5
Draft a Q3 strategy proposal
opus-4.7
Classify this support ticket
haiku-4.5
Summarise last week's meeting
haiku-4.5
Analyse competitor pricing data
opus-4.7
Extract contact from intake form
haiku-4.5
Are these SLA terms standard?
haiku-4.5
Translate FAQ section to Spanish
haiku-4.5
75% small model
25% frontier
avg $0.014 / req~6× cheaper than all-frontier
By the studio
Agnotiq Studio

Most businesses don't care which model is answering the question. They care that it's fast, accurate, and doesn't bankrupt them.

Yet behind the scenes, you're probably juggling frontier models — GPT-5-class, Claude, Gemini — and open-source or smaller alternatives. The trick is a small, boring piece of plumbing that quietly decides: “Which model gets this call?” That's all routing is. And it's quietly one of the biggest levers on your AI cost, quality, and long-term flexibility.

Why this matters for small and midsize businesses

For SMBs, every dollar of AI spend is visible. You're not an enterprise with a “whatever it costs” AI budget. Frontier models are powerful but expensive for routine work. Open and smaller models are often good enough for 80–90% of tasks and cost a fraction.

If you route every request through a frontier model, you're paying for a sports car when you need a station wagon.

Where your requests actually go
Routine tasks (FAQs, triage, extraction)80–90%
→ Small / open model — fraction of the cost
Complex reasoning, drafting, strategy10–20%
→ Frontier model — pay for what you actually need
A typical SMB request mix once routing is in place

The real cost of naïve AI wiring

Most teams start with something like: “All AI requests go to the frontier model — it's the best.” This feels responsible. It isn't.

Signs you're over-routing to frontier models
  • Opaque cost spikes at end of month with no clear culprit
  • Simple FAQ responses taking 2–3 seconds when they could be instant
  • Paying for reasoning headroom you never actually use
  • Every retry and edge case runs on the most expensive tier

Routing fixes this at the architecture level, not the spreadsheet level. Once it's in place, you stop debugging AI costs and start predicting them.

What “routing” actually means

Think of routing as a traffic cop sitting between your users and your models. A simple request — “What are your office hours?” — goes to a small, cheap model. A complex one — “Draft a proposal for a 3-month pilot with 12 stakeholders” — goes to the frontier model. You're not forcing everything through one model; you're sending the right task to the right model at the right price.

How to route intelligently (without overengineering)

You don't need FinTech-grade AI orchestration software. For SMBs, good routing looks like three filters applied in order.

Routing decision — three filters in order
01
Task complexity
Simple / repeatable → small modelComplex reasoning → frontier
02
Sensitivity & privacy
Internal / sensitive → self-hostedPublic-facing → frontier API
03
Cost ceiling
Set a max cost per call (e.g. $0.005). If the smaller model delivers, it wins. Only escalate when quality requires it.
Apply these filters in sequence — most requests never reach filter 3

Where routing fits in your stack

The routing layer is thin by design. It sits between your application and the model pool — invisible to your users, inexpensive to maintain, and easy to tune as your usage patterns shift.

Your AI stack
User request
Any channel — chat, API, form
Routing layer
Classifies, decides, dispatches
Model pool
Frontier · Open · Self-hosted
Response
Fast, consistent, cost-controlled
The routing layer is the only new piece — everything else already exists

How to start with routing today

You don't need to build everything from scratch. Start with two or three clear task buckets, pick one frontier model for the heavy-lifting bucket, pick one cheaper model for the rest, and route via a simple conditional in your code or workflow engine.

01
Define 2–3 task buckets
FAQs and triage / drafting and strategy / internal and sensitive. Three buckets covers most SMB use cases.
02
Pick one frontier model
For the bucket that genuinely needs it. Don't debate which one — pick the one your team already knows.
03
Pick one or two cheaper models
Open weights or a smaller API model. Benchmark against your real tasks, not synthetic benchmarks.
04
Route via a simple conditional
A function that looks at task type and cost ceiling and returns a model ID. Ship it. Tune later.

Eventually, routing becomes a silent, revenue-protecting component of your platform — like a well-tuned HVAC system: nobody thinks about it until it stops working.

How Agnotiq approaches this

We pre-wire routing into every workflow we build. Frontier models handle complex orchestration and reasoning; smaller, open models handle repetitive, high-volume tasks. You get the best of both worlds without the cost of using frontier models for everything — and without spending a week debugging why your AI bill doubled.

Route the request. Pay for what you need.

Let's build

Have a workflow that deserves an agent?

Tell us what's eating your team's afternoons. We'll come back inside three days with a discovery plan, a price, and the names of the engineers we'd put on it.