The Ultracode Field Guide: When 16 Agents Beat One Pass

Q: Is ultracode a smarter model than /effort high?

No. Per Anthropic's effort docs, ultracode is not an API effort level at all — it's the xhigh reasoning level plus standing permission for Claude Code to launch multi-agent workflows on its own. The model is the same; what changes is that Claude can now spin up a swarm without asking each time.

Q: What's the difference between typing ultracode in a prompt and setting /effort ultracode?

The keyword ultracode in one prompt runs just that single task as a workflow and leaves your effort level untouched. /effort ultracode makes Claude decide, for every substantive task the rest of the session, whether to orchestrate. It lasts the session and resets when you start a new one — drop back with /effort high for routine work. (On some builds the literal trigger keyword has been workflow rather than ultracode; plain natural-language requests like "run a workflow" work either way.)

Q: How many agents can actually run, and can a script run away?

The runtime caps concurrency at 16 agents at once (fewer on machines with limited CPU cores) and 1,000 agents total per run. Those are guardrails against a runaway loop, not a goal — the docs put realistic runs in the dozens-to-hundreds range. The cap bounds the worst-case cost of a bad script.

Q: Which single setting cuts cost the most?

Your model in /model. Every agent uses the session's model unless the script routes a stage elsewhere, so the model choice multiplies across the whole swarm. Running a big workflow on a smaller model, or telling Claude to use a smaller model for the stages that don't need the strongest one, moves the bill more than anything else you can do.

Q: Can I stop a workflow partway and not lose the work?

Yes, within the same session. In /workflows, p pauses/resumes and x stops a run. When you resume, agents that already finished return their cached results and only the rest run live. The catch: if you quit Claude Code entirely while a workflow is running, the next session starts it fresh.

Q: When is xhigh the right call instead of full ultracode?

When the task is hard reasoning in a few files, not breadth. xhigh gives you the deep per-step thinking (Anthropic recommends it as the starting point for coding/agentic work) without spawning a swarm. Reach for an actual workflow only when the task is wide or needs cross-checking. And reserve max for genuinely frontier problems — on most work it just adds cost.

Q: How do I reuse a workflow I liked?

In /workflows, select the run and press s. Save it to .claude/workflows/ (shared via the repo) or ~/.claude/workflows/ (just you). It becomes / in future sessions and can accept input through an args global — so you can pass a question or a list of paths at call time instead of editing the script.

Ground truthFirst, fix one thing the hype gets wrong

Ultracode is not a new model and not a secret API setting. In Anthropic's own effort docs it says it plainly: ultracode appears in Claude Code's effort menu, but it is not an additional API effort level. The real API ladder is five rungs only — low, medium, high (the default), xhigh, max. What /effort ultracode actually does is pair the xhigh rung with standing permission for Claude Code to launch multi-agent workflows on its own. So when you toggle it, you are not buying a smarter brain. You are handing Claude a budget and the keys to spin up a swarm. That distinction is the whole game, because the swarm is where the money goes.

Needs a recent paid-plan build of Claude Code (Pro/Max/Team/Enterprise) — run /version to confirm yours is current. On Pro you also flip the Dynamic workflows row in /config on first. ultracode only appears in /effort on models that support the xhigh rung; the effort-levels doc lists which models qualify.

Why it can beat a single passThe actual difference: who holds the plan

The video said the agents argue. True, but the deeper reason orchestration wins is structural, and it is the one thing the short can't show you: where the plan and the half-finished work live. A normal chat, subagents, even agent teams all keep intermediate results in Claude's context window — so a long job competes with itself for attention and drifts. A workflow moves the loop, the branching, and every intermediate result into a JavaScript script the runtime executes outside the conversation. Claude's context ends up holding only the final answer. That is why a 500-file pass doesn't forget what it found in file 30 by the time it reaches file 480.

	Subagents	Skills	Agent teams	Workflows (ultracode)
Who decides what runs next	Claude, turn by turn	Claude, per the prompt	Lead agent, turn by turn	The script
Where mid-run results live	Context window	Context window	Shared task list	Script variables (off-context)
What's repeatable	The worker def	The instructions	The team def	The orchestration itself
Scale	A few per turn	Same as subagents	A handful of peers	Dozens to hundreds per run
If you interrupt	Restarts the turn	Restarts the turn	Teammates keep running	Resumable in the same session

The quality pattern here is more than more agents. It's having independent agents adversarially review each other's findings, or draft a plan from several angles and weigh them, before anything reaches you. That's a repeatable check a single pass can't run on itself.

When 16 agents earn their keepThe decision rule: orchestrate or run one pass

Spawning a swarm costs meaningfully more tokens than the same task in a chat, so the question is never is ultracode better — it's does this task reward fanning out. Here is the rule I use. A task is worth a workflow only if it clears both gates below. If it clears only one (or neither), /effort high gives you the same answer for a fraction of the bill.

Gate 1 — Breadth or stakes. Is the work spread across many files/sources (a codebase-wide sweep, a big migration, multi-source research), OR is being wrong genuinely expensive (a security audit, a plan you'll commit weeks to)? If neither, stop — use /effort high.
Gate 2 — Does it reward cross-checking? Would independent angles or adversarial review actually change the answer? Research that needs sources weighed against each other: yes. Re-running a deterministic refactor: no, one pass is fine.
Both gates clear -> run a workflow. Good fits: codebase-wide bug/auth sweeps, a 500-file migration, multi-source research where claims must survive cross-checking, drafting a hard plan from several independent angles.
Only one (or zero) clears -> /effort high. Single-file edits, quick lookups, and strict serial A->B->C work where each step just needs the last one's output get nothing from the orchestration tax.

Quick tell: if you can describe the task as one thing done once, it's a single pass. If it's the same check run across many things, then reconciled, it's a workflow.

Steer the bill before you hit goThe 3 settings that cap the spend — named, with where each lives

The token bill is the only real downside, and it's entirely yours to prevent. These are the three controls that actually move the number, not vibes. Set them before the run, not after the invoice.

1) Model — set in /model, checked before the run. Every agent in a workflow uses your session's model unless the script routes a stage elsewhere. So a swarm running on Opus is the priciest possible shape. Run /model first; if you normally code on a smaller model, stay on it, or tell Claude in your prompt to use a smaller model for the stages that don't need the strongest one. This is the single biggest lever — model choice multiplies across every one of the dozens-to-hundreds of agents.
2) Scope — controlled in your prompt + watched in /workflows. Run on a slice first: one directory instead of the whole repo, one narrow question instead of a broad one. The /workflows progress view shows each phase's agent count, token total, and elapsed time live; press p to pause/resume, x to stop the whole run, and you keep every completed agent's result. Stop the moment tokens outpace value.
3) The hard caps + the approval gate — built into the runtime, surfaced at launch. The runtime bounds a runaway script at 16 concurrent agents (fewer on low-core machines) and 1,000 agents total per run — that's the ceiling, not a target. Before any run, the approval card lists the planned phases and a token-usage caution; choose View raw script (or Ctrl+G to open it in your editor) to see the plan before you spend a token. In Default/accept-edits mode you get this prompt every run unless you opted into don't ask again for that workflow.

Two more guards worth knowing: add the shell/web/MCP commands your agents need to your allowlist before a long run (otherwise it pauses mid-flight for permission), and if you want it off entirely, set "disableWorkflows": true in ~/.claude/settings.json, or CLAUDE_CODE_DISABLE_WORKFLOWS=1, or just toggle off Ultracode keyword trigger in /config to stop the keyword firing by accident.

What the trade-off looks like in practiceWorked example: cost vs quality on the same task

Take one real task — audit every endpoint under src/routes/ for missing auth checks — and run it three ways. The numbers below are about shape and direction, not a promise of exact counts (your repo size, model, and plan move them). The point is to show where each option pays off and where it just burns tokens.

Approach	How it runs	Relative cost	Best when
`/effort high`, single pass	One serial pass reads files in sequence; results stay in context	Lowest	Small route folder; you mostly trust one careful read
`xhigh` (no workflow)	Deeper per-step reasoning, still one agent; expect meaningfully higher tokens than high	Medium	Tricky logic in a few files where the reasoning is the hard part
`ultracode` workflow	Fans out across routes, agents cross-check findings, votes, reports survivors	Highest	Wide route surface where a missed auth check is expensive to ship

Anthropic's own guidance: start coding/agentic work at xhigh, keep high as the floor for most intelligence-sensitive work, and reserve max for genuinely frontier problems — on most tasks max adds real cost for small gains and can even overthink structured output. When you do run xhigh/ultracode, give the run a generous max_tokens ceiling so the swarm has room to think and act — there's no official number here, so treat any specific figure you see floating around as community practice, not Anthropic guidance, and tune it to your own runs.

Try orchestration without flipping the sessionThe lowest-risk way to feel it: /deep-research

If you want the ultracode experience without setting /effort ultracode on the whole session, run the one workflow Anthropic ships in the box. /deep-research <question> fans web searches across several angles, fetches and cross-checks the sources it finds, votes on each claim, and hands back a cited report with the claims that didn't survive cross-checking already filtered out. It's the cleanest demo of the adversarial-verification pattern, scoped to one question.

Run it: /deep-research What changed in the Node.js permission model between v20 and v22?
Approve the plan when Claude Code asks (it shows the phases first).
Watch with /workflows -> arrow to the run -> Enter. You'll see agent count, tokens, and elapsed time per phase.
Read the cited report when it lands. Requires the WebSearch tool to be available.

Like every workflow, when a run does what you wanted you can press s in /workflows to save its script as a /command — to your project's .claude/workflows/ (shared with the repo) or ~/.claude/workflows/ (just you). It then runs as /<name> and can take input via an args global.

Get the next drop

New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.

By submitting you agree to our Privacy Policy & Terms. Unsubscribe anytime.

Frequently asked questions

Is ultracode a smarter model than /effort high?

No. Per Anthropic's effort docs, ultracode is not an API effort level at all — it's the xhigh reasoning level plus standing permission for Claude Code to launch multi-agent workflows on its own. The model is the same; what changes is that Claude can now spin up a swarm without asking each time.

What's the difference between typing ultracode in a prompt and setting /effort ultracode?

The keyword ultracode in one prompt runs just that single task as a workflow and leaves your effort level untouched. /effort ultracode makes Claude decide, for every substantive task the rest of the session, whether to orchestrate. It lasts the session and resets when you start a new one — drop back with /effort high for routine work. (On some builds the literal trigger keyword has been workflow rather than ultracode; plain natural-language requests like "run a workflow" work either way.)

How many agents can actually run, and can a script run away?

The runtime caps concurrency at 16 agents at once (fewer on machines with limited CPU cores) and 1,000 agents total per run. Those are guardrails against a runaway loop, not a goal — the docs put realistic runs in the dozens-to-hundreds range. The cap bounds the worst-case cost of a bad script.

Which single setting cuts cost the most?

Your model in /model. Every agent uses the session's model unless the script routes a stage elsewhere, so the model choice multiplies across the whole swarm. Running a big workflow on a smaller model, or telling Claude to use a smaller model for the stages that don't need the strongest one, moves the bill more than anything else you can do.

Can I stop a workflow partway and not lose the work?

Yes, within the same session. In /workflows, p pauses/resumes and x stops a run. When you resume, agents that already finished return their cached results and only the rest run live. The catch: if you quit Claude Code entirely while a workflow is running, the next session starts it fresh.

Do I have to approve what it's about to do?

In Default and accept-edits modes, yes — every run shows an approval card with the planned phases and a token-usage caution, and you can pick View raw script (or Ctrl+G) to read the plan first. In Auto mode you're prompted on first launch only; in bypass/claude -p/Agent SDK there's no prompt and the run starts immediately. Note the subagents always run in acceptEdits and inherit your tool allowlist regardless of session mode — so add the shell/web commands they need beforehand to avoid mid-run pauses.

When is xhigh the right call instead of full ultracode?

When the task is hard reasoning in a few files, not breadth. xhigh gives you the deep per-step thinking (Anthropic recommends it as the starting point for coding/agentic work) without spawning a swarm. Reach for an actual workflow only when the task is wide or needs cross-checking. And reserve max for genuinely frontier problems — on most work it just adds cost.

How do I reuse a workflow I liked?

In /workflows, select the run and press s. Save it to .claude/workflows/ (shared via the repo) or ~/.claude/workflows/ (just you). It becomes /<name> in future sessions and can accept input through an args global — so you can pass a question or a list of paths at call time instead of editing the script.

Sources · Orchestrate subagents at scale with dynamic workflows — Claude Code Docs · Effort parameter (effort levels; ultracode = xhigh + workflow permission) — Claude API Docs · Introducing dynamic workflows in Claude Code — Anthropic · Claude just dropped UltraCode... its Insane — Jack Roberts · Claude Just Dropped ULTRA CODE (5 min) — Tristen O'Brien · Claude Just Dropped ULTRA CODE — Tristen O'Brien

The Ultracode Field Guide: When 16 Agents Beat One Pass

Ground truthFirst, fix one thing the hype gets wrong

Why it can beat a single passThe actual difference: who holds the plan

When 16 agents earn their keepThe decision rule: orchestrate or run one pass

Steer the bill before you hit goThe 3 settings that cap the spend — named, with where each lives

What the trade-off looks like in practiceWorked example: cost vs quality on the same task

Try orchestration without flipping the sessionThe lowest-risk way to feel it: /deep-research

Get the next drop

Frequently asked questions

Once you can build agents, the next move is to sell them

Grab the AI Reseller Starter Kit

The Ultracode Field Guide: When 16 Agents Beat One Pass

Ground truthFirst, fix one thing the hype gets wrong

Why it can beat a single passThe actual difference: who holds the plan

When 16 agents earn their keepThe decision rule: orchestrate or run one pass

Steer the bill before you hit goThe 3 settings that cap the spend — named, with where each lives

What the trade-off looks like in practiceWorked example: cost vs quality on the same task

Try orchestration without flipping the sessionThe lowest-risk way to feel it: /deep-research

Get the next drop

Frequently asked questions

More free guides

Once you can build agents, the next move is to sell them

Grab the AI Reseller Starter Kit