Agent Tokenomics: How to Price an AI Agent for a Client (Without Guessing or Losing Money)

Why most people price AI agents wrong

Most builders quote a client a flat monthly number pulled out of thin air, then quietly hope the token bill stays under it. That works until the agent gets used. The fix is boring and it works: figure out what ONE run of the agent actually costs in tokens, multiply by how often it runs, add your margin, and quote from that. This guide shows the exact arithmetic — with real 2026 prices — and the one trick (routing tasks to the right-sized model) that cuts the bill the most. Cloudflare runs this at scale internally: across its first 30 days, its AI code-review system did 131,246 review runs over 48,095 merge requests, and the median run cost $0.98 (average $1.19). That's the whole game in one number — a known, predictable cost per task.

A flat quote with no cost model is a bet, not a price.
The unit that matters is cost PER TASK (per call, per review, per conversation), not per month.
Once you know cost-per-task, the monthly number falls out of usage — and so does your margin.

The only formula you need

Token billing is simple once you see it. Every model has two prices: one for the tokens you send IN (the prompt, the context, the data), one for the tokens it sends OUT (the answer). You pay per million tokens. So the cost of one agent task is:

Cost per task = (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price)
Input tokens = your system prompt + the user's request + any context/data you stuff in (this is usually the BIG number).
Output tokens = what the model writes back (usually much smaller, but priced ~5× higher per token).
Rough rule: 1 token ≈ 4 characters ≈ 0.75 of a word. A page of text is ~500–700 tokens.

Output is priced about 5× input on every current Claude model, so a chatty agent that writes long answers costs more than the input size alone suggests. Watch the output.

Real 2026 prices (verified)

Here's what three Claude tiers actually cost as of June 2026, straight from Anthropic's pricing page. Cheap, mid, and frontier. The gap between them is the entire reason routing works — the cheapest model is 5× cheaper than the frontier one for the same tokens.

Model	Input ($/1M tokens)	Output ($/1M tokens)	Use it for
Claude Haiku 4.5	$1	$5	Easy, high-volume calls — classify, extract, route, short replies
Claude Sonnet 4.6	$3	$15	The everyday workhorse — most real tasks
Claude Opus 4.8	$5	$25	The hard 20% — tricky reasoning, the calls you can't get wrong

Prices change. Always re-check the vendor's own pricing page before you quote a client — these are Anthropic's published rates on 2026-06-10. There's also a 90%-off cache-read price for repeated context, which we'll use below.

Worked example: pricing one agent task

Say your agent reviews a chunk of work for a client — a support ticket, a contract clause, a code diff, whatever. A realistic task sends ~30,000 input tokens (instructions + the thing to review + some context) and writes ~3,000 output tokens back. Same task, three models:

Model	Input cost	Output cost	Cost per task
Haiku 4.5	30k × $1/1M = $0.030	3k × $5/1M = $0.015	$0.045
Sonnet 4.6	30k × $3/1M = $0.090	3k × $15/1M = $0.045	$0.135
Opus 4.8	30k × $5/1M = $0.150	3k × $25/1M = $0.075	$0.225

These are illustrative token volumes (30k in / 3k out) priced at Anthropic's real published rates. Your token counts will differ — measure your own agent on 10 real runs and average them. But the SHAPE holds: the frontier model costs ~5× the cheap one for the identical task.

The move that cuts the bill: model-routing

Here's the lever. Most tasks are easy. A few are hard. If you send everything to the frontier model 'to be safe', you pay frontier prices for work a cheap model would nail. Instead, route: a cheap model (or a small classifier) decides if a task is easy or hard, then easy tasks go to Haiku and only the hard ones go to Opus. Cloudflare does exactly this — it reserves its top-tier models (Claude Opus 4.7 / GPT-5.4) for the coordinator that orchestrates the review, and runs the bulk 'sub-reviewer' work on standard-tier models (Claude Sonnet 4.6 / GPT-5.3 Codex), with a lightweight model for text-heavy odds and ends. Same pattern, any scale. Watch what it does to 1,000 tasks a month:

Strategy	Math	Monthly cost
Everything on Opus 4.8	1,000 × $0.225	$225
Routed: 80% Haiku + 20% Opus	(800 × $0.045) + (200 × $0.225)	$81
You just saved	$225 − $81	$144 (64% less)

The routing decision itself costs almost nothing — a one-line Haiku classification is a fraction of a cent. Two more free wins: cache the parts of your prompt that don't change (cached input reads are 90% off), and don't let the agent ramble (output is the expensive side).

Turn cost-per-task into a client price

Now you have a real cost floor, build the quote on top of it instead of guessing. Four steps:

1. Measure: run your agent on ~10 real tasks, average the input + output tokens, get your true cost-per-task (use the formula above).
2. Estimate volume: ask the client how many runs/month they expect, then assume MORE — usage always grows once it works. Price for the higher number.
3. Add a safety multiple: multiply your raw token cost by 3–5× to cover retries, longer-than-expected tasks, support, and your margin. Token cost should be a small slice of what you charge, not the whole price.
4. Quote a plan with a ceiling: e.g. 'up to 1,000 runs/month included, then $X per 100 after'. Now overage is the client's choice, not your loss.

Bill the client on a clean unit they understand — a credit, a minute, a 'run' — not raw tokens. They should never see the word 'token'. You absorb the routing complexity; they see one predictable number.

The 6-point tokenomics checklist

Run this before you send any AI-agent quote:

Do I know my agent's average input AND output tokens from REAL runs (not a guess)?
Have I priced one task on the cheap, mid, AND frontier model so I know the spread?
Am I routing — easy tasks to a cheap model, only the hard ones to the frontier?
Am I caching the static part of my prompt (90% off on repeat reads)?
Did I check the vendor's live pricing page TODAY, not from memory?
Is my client price a clean unit (credit/minute/run) with a ceiling, at a 3–5× multiple over raw cost?

Get the next drop

New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.

Frequently asked questions

How do I estimate the token cost of an AI agent before I build it?

Take your best guess at input tokens (system prompt + the data/context you'll feed it + the user's request) and output tokens (the answer length), then apply the formula: <code>(input ÷ 1,000,000 × input price) + (output ÷ 1,000,000 × output price)</code>. A page of text is roughly 500–700 tokens. Once a prototype exists, replace the guess by measuring 10 real runs and averaging — that's your true cost-per-task.

What is model-routing and why does it cut cost so much?

Routing means sending each task to the smallest model that can do it well: easy/high-volume calls go to a cheap model (e.g. Claude Haiku 4.5 at $1/$5 per million tokens), and only the hard calls go to the frontier model (e.g. Opus 4.8 at $5/$25). Because the cheap model is ~5× cheaper, and most tasks are easy, the blended cost drops sharply. In the guide's example, routing 80% of tasks to Haiku cut a $225/month bill to $81 — 64% less.

Is the $0.98 per-review Cloudflare number real?

Yes — it's from Cloudflare's own engineering blog ("Orchestrating AI code review at scale"). Across the first 30 days (March 10 – April 9, 2026), their internal AI code-review system ran 131,246 reviews over 48,095 merge requests, with a median cost of $0.98 per review and an average of $1.19. They route work across model tiers — top-tier models coordinate, standard-tier models do the bulk reviewing. It's a clean real-world anchor for 'known cost per task'.

Should I charge clients per token?

No. Clients don't think in tokens and a per-token bill feels unpredictable and scary. Charge on a clean unit they understand — a credit, a per-minute rate, or a 'run' — with a monthly ceiling and a clear overage price. You handle the token math and routing behind the scenes; they get one predictable number. Set your unit price at roughly 3–5× your raw token cost to cover retries, support, and margin.

Why is output so much more expensive than input?

On every current Claude model, output tokens are priced about 5× the input rate (Haiku $1 in / $5 out, Opus $5 in / $25 out). Generating text is more compute-intensive than reading it. The practical takeaway: a verbose agent that writes long answers can cost more than its large input suggests, so cap output length and ask for concise responses where you can.

Sources · Orchestrating AI Code Review at scale — Cloudflare Blog · Pricing — Anthropic / Claude API Docs

Agent Tokenomics: How to Price an AI Agent for a Client (Without Guessing or Losing Money)

Why most people price AI agents wrong

The only formula you need

Real 2026 prices (verified)

Worked example: pricing one agent task

The move that cuts the bill: model-routing

Turn cost-per-task into a client price

The 6-point tokenomics checklist

Get the next drop

Frequently asked questions

The hard part isn't the math — it's billing the client cleanly

Grab the AI Reseller Starter Kit

Agent Tokenomics: How to Price an AI Agent for a Client (Without Guessing or Losing Money)

Why most people price AI agents wrong

The only formula you need

Real 2026 prices (verified)

Worked example: pricing one agent task

The move that cuts the bill: model-routing

Turn cost-per-task into a client price

The 6-point tokenomics checklist

Get the next drop

Frequently asked questions

More free guides

The hard part isn't the math — it's billing the client cleanly

Grab the AI Reseller Starter Kit