Why most people price AI agents wrong
Most builders quote a client a flat monthly number pulled out of thin air, then quietly hope the token bill stays under it. That works until the agent gets used. The fix is boring and it works: figure out what ONE run of the agent actually costs in tokens, multiply by how often it runs, add your margin, and quote from that. This guide shows the exact arithmetic — with real 2026 prices — and the one trick (routing tasks to the right-sized model) that cuts the bill the most. Cloudflare runs this at scale internally: across its first 30 days, its AI code-review system did 131,246 review runs over 48,095 merge requests, and the median run cost $0.98 (average $1.19). That's the whole game in one number — a known, predictable cost per task.
- A flat quote with no cost model is a bet, not a price.
- The unit that matters is cost PER TASK (per call, per review, per conversation), not per month.
- Once you know cost-per-task, the monthly number falls out of usage — and so does your margin.
The only formula you need
Token billing is simple once you see it. Every model has two prices: one for the tokens you send IN (the prompt, the context, the data), one for the tokens it sends OUT (the answer). You pay per million tokens. So the cost of one agent task is:
- Cost per task = (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price)
- Input tokens = your system prompt + the user's request + any context/data you stuff in (this is usually the BIG number).
- Output tokens = what the model writes back (usually much smaller, but priced ~5× higher per token).
- Rough rule: 1 token ≈ 4 characters ≈ 0.75 of a word. A page of text is ~500–700 tokens.
Real 2026 prices (verified)
Here's what three Claude tiers actually cost as of June 2026, straight from Anthropic's pricing page. Cheap, mid, and frontier. The gap between them is the entire reason routing works — the cheapest model is 5× cheaper than the frontier one for the same tokens.
| Model | Input ($/1M tokens) | Output ($/1M tokens) | Use it for |
|---|---|---|---|
| Claude Haiku 4.5 | $1 | $5 | Easy, high-volume calls — classify, extract, route, short replies |
| Claude Sonnet 4.6 | $3 | $15 | The everyday workhorse — most real tasks |
| Claude Opus 4.8 | $5 | $25 | The hard 20% — tricky reasoning, the calls you can't get wrong |
Worked example: pricing one agent task
Say your agent reviews a chunk of work for a client — a support ticket, a contract clause, a code diff, whatever. A realistic task sends ~30,000 input tokens (instructions + the thing to review + some context) and writes ~3,000 output tokens back. Same task, three models:
| Model | Input cost | Output cost | Cost per task |
|---|---|---|---|
| Haiku 4.5 | 30k × $1/1M = $0.030 | 3k × $5/1M = $0.015 | $0.045 |
| Sonnet 4.6 | 30k × $3/1M = $0.090 | 3k × $15/1M = $0.045 | $0.135 |
| Opus 4.8 | 30k × $5/1M = $0.150 | 3k × $25/1M = $0.075 | $0.225 |
The move that cuts the bill: model-routing
Here's the lever. Most tasks are easy. A few are hard. If you send everything to the frontier model 'to be safe', you pay frontier prices for work a cheap model would nail. Instead, route: a cheap model (or a small classifier) decides if a task is easy or hard, then easy tasks go to Haiku and only the hard ones go to Opus. Cloudflare does exactly this — it reserves its top-tier models (Claude Opus 4.7 / GPT-5.4) for the coordinator that orchestrates the review, and runs the bulk 'sub-reviewer' work on standard-tier models (Claude Sonnet 4.6 / GPT-5.3 Codex), with a lightweight model for text-heavy odds and ends. Same pattern, any scale. Watch what it does to 1,000 tasks a month:
| Strategy | Math | Monthly cost |
|---|---|---|
| Everything on Opus 4.8 | 1,000 × $0.225 | $225 |
| Routed: 80% Haiku + 20% Opus | (800 × $0.045) + (200 × $0.225) | $81 |
| You just saved | $225 − $81 | $144 (64% less) |
Turn cost-per-task into a client price
Now you have a real cost floor, build the quote on top of it instead of guessing. Four steps:
- 1. Measure: run your agent on ~10 real tasks, average the input + output tokens, get your true cost-per-task (use the formula above).
- 2. Estimate volume: ask the client how many runs/month they expect, then assume MORE — usage always grows once it works. Price for the higher number.
- 3. Add a safety multiple: multiply your raw token cost by 3–5× to cover retries, longer-than-expected tasks, support, and your margin. Token cost should be a small slice of what you charge, not the whole price.
- 4. Quote a plan with a ceiling: e.g. 'up to 1,000 runs/month included, then $X per 100 after'. Now overage is the client's choice, not your loss.
The 6-point tokenomics checklist
Run this before you send any AI-agent quote:
- Do I know my agent's average input AND output tokens from REAL runs (not a guess)?
- Have I priced one task on the cheap, mid, AND frontier model so I know the spread?
- Am I routing — easy tasks to a cheap model, only the hard ones to the frontier?
- Am I caching the static part of my prompt (90% off on repeat reads)?
- Did I check the vendor's live pricing page TODAY, not from memory?
- Is my client price a clean unit (credit/minute/run) with a ceiling, at a 3–5× multiple over raw cost?
Get the next drop
New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.