Kno2gether Start free
Field Guide

Nested Subagents: Claude Code's Depth-5 Upgrade (and 5 Patterns That Actually Use It)

Your AI agent now spawns its own agents — up to five layers deep. Here's what that really means for your context window, the five workflows it unlocks, and how to switch it on in Claude Code and Codex.

See how Knotie works
01

Why your main chat keeps getting dumber

Here's the thing nobody warns you about: an AI coding agent gets worse the longer you talk to it. Not because the model is weak — because its context window fills with noise. Every file it reads, every search it runs, every test log it dumps stays in the conversation, crowding out the decisions that actually matter. By the time it goes to edit code, it's reasoning over a haystack. Subagents are the fix. A subagent does the noisy work — searching the codebase, reading docs, checking whether a library is sane — off in its own separate context window, then hands back only the summary. The important reasoning happens early, while the main window is still lean and high-signal. Claude Code already does this on its own for big jobs. The upgrade is what those subagents can now do themselves.

  • A full context window = worse decisions on the edits and commands that count
  • A subagent runs noisy work in an isolated window and returns only the summary
  • So the main conversation stays short, sharp, and high-signal
02

What actually changed: subagents can now nest

Until now there was a hard floor: a subagent could not spawn its own subagents. Delegation was one level deep and flat. Around June 9, 2026, Anthropic's Claude Code lead, Boris Cherny, announced nested subagent support — a subagent can now delegate to a deeper subagent, and so on, capped at five layers. (Worth being precise: this is a fresh rollout. The static Claude Code subagents doc still reads the old way, and Cherny framed the depth-5 cap as a starting point he wants feedback on — so treat 'five' as today's ceiling, not gospel.) Why does the ceiling matter? Because each layer gets its own clean window. A layer-1 subagent that hits something messy can offload it to layer 2 instead of polluting its own context — so the answer that bubbles back up to your main session is clean at every level. Codex shipped the same idea earlier as a config knob: agents.max_depth, which defaults to 1.

  • Old rule: subagents could NOT spawn subagents (one flat layer)
  • New: nested delegation up to 5 layers deep — each layer its own fresh window
  • Attributed to Boris Cherny (Claude Code lead, Anthropic), ~Jun 9 2026 — early rollout, cap may move
  • Codex equivalent: set agents.max_depth above 1 in config (default is 1)
03

The mental model: fan out, then collapse

Forget 'layers' for a second and picture a funnel turned on its side. Your main session asks one big question. It fans that question out into parallel subagents — one per chunk of work. Each of those, if it hits something noisy, fans out AGAIN into its own helpers. Work explodes outward, runs in parallel, then collapses back inward: deep results summarize up to their parent, parents summarize up to yours. You spend almost none of your main context, and you get an answer that was actually investigated in depth instead of skimmed. That single shape — fan out, then collapse — is what every pattern below is built on.

  • Fan out: split the job so each subagent owns one clean slice
  • Go deeper only where it's noisy: let a slice spawn its own helpers
  • Collapse: every layer returns a summary up to its parent, not raw output
  • Net effect: deep investigation, tiny main-context cost
04

Pattern 1 — Research & claim-verification fan-out

You have a folder of articles, or ten sources, and you want the big claims actually checked — not vibe-checked. Hand one source to each layer-1 subagent. Its job: extract the claims worth checking. For each claim that matters, it spawns a layer-2 subagent to do the noisy verification — search your knowledge base, hit the web, cross-reference. All sources run in parallel; all the messy searching is buried in layer 2. What comes back to you is a short list: claim, verdict, evidence. This is the single most useful pattern for anyone doing research, due diligence, or fact-checking at volume.

  1. Prompt: 'Load each article one per subagent. Extract the biggest claims.'
  2. Then: 'For each important claim, spawn a subagent to verify it against my notes and the web.'
  3. Let it run all sources in parallel — the searching stays in the deeper layer
  4. You receive a clean claim → verdict → evidence table, not a wall of search results
05

Pattern 2 — The debugging cause-tree

This is the most technical payoff, and the one that catches bugs a flat agent misses. A layer-1 subagent loads a fresh incident and lists the plausible causes. It fans those out: one layer-2 subagent per suspected cause, each searching the codebase for evidence. When one of them smells something deeper, it spawns a layer-3 subagent to cross-check the server logs. Why this beats one big context window: shallow debugging finds the first thing that looks broken and stops. The cause-tree keeps a separate clean window for the hunch that there's a deeper root cause underneath the obvious symptom — so it surfaces the real bug, not just the first one. Then the whole tree collapses back and you fix it in your main session, which has barely been touched.

  1. Layer 1: 'Load this incident, list every plausible cause.'
  2. Layer 2: one subagent per cause, each hunts the codebase for evidence — in parallel
  3. Layer 3: any cause that looks deeper spawns a check against the logs / live system
  4. The tree collapses to a single root cause; you fix it without burning your main context
An incident node fanning into parallel investigation branches, one drilling down to a hidden root cause glowing beneath the surface
06

Pattern 3 — Second & third-order effects (blast radius)

Big decisions fail on the consequences nobody traced. Say you're working through ten contracts, or ten files, toward one goal. Give each one its own subagent to consider the direct implications. Then go a layer deeper: a layer-2 subagent measures the blast radius — if we change THIS, what else has to change? Any noisy digging it needs (old email threads, related clauses, dependent code) it pushes down to a layer-3 subagent. You're now reasoning about second- and third-order effects in parallel, across every item, without holding all of it in one head. For anything high-stakes, this routinely uncovers consequences a single pass would have shipped right past.

  1. One subagent per item (contract / file / decision) → direct implications of the goal
  2. Layer 2: 'measure the blast radius — what else must change if we do this?'
  3. Layer 3: delegate the noisy lookups (history, dependencies) so layer 2 stays clean
  4. Use it before any irreversible change — it's a cheap way to see around corners
07

Pattern 4 — Compare skills head-to-head, and improve them overnight

Two more moves that only work now that agents can nest. First, head-to-head comparison: load Skill A into one layer-1 subagent and Skill B into another, in parallel. Each spawns its own helpers to actually do the job — say, two competing code-review skills, each running its own review subagents — then you ask the main session which results are better. You get an honest, run-it-for-real verdict instead of reading two READMEs. Second, optimization loops: have a subagent run a skill, spawn helpers to grade the output, then rewrite the skill and run it again — version after version, overnight if you let it — until it's reliably better. If it's a skill you use every day, that's some of the best token spend you'll find.

  1. Compare: 'Run Skill A and Skill B in separate subagents, then tell me which output is better and why.'
  2. Each skill spawns its own subagents to do the real work — a fair fight, not a spec sheet
  3. Optimize: run skill → grade output in subagents → rewrite skill → repeat
  4. Point it at a skill you run daily; let it iterate to a consistently better version
08

How to switch it on (and when NOT to)

In Claude Code: update to the latest version — nested subagents arrived in the June rollout. The smartest models will start reaching for a deeper subagent on their own when a job needs it, but today you'll often get more by asking for it explicitly ('do the verification in separate subagents') or by designing your skills to delegate. In Codex: open your config and, under [agents], set max_depth to the number of levels you want (it defaults to 1, which blocks grandchildren). The honest caveat: nesting is not free. Every layer adds latency and tokens, and most quick tasks should stay flat. Reach for depth when the work is genuinely big and noisy — research at volume, real debugging, high-stakes decisions — and keep everyday edits to a single context.

  1. Claude Code: update to latest; prompt nesting explicitly or bake it into your skills
  2. Codex: in config under [agents], raise max_depth above its default of 1
  3. Default to flat — only nest when the job is big and noisy enough to earn it
  4. Watch latency and token cost: depth is a power tool, not a default setting

Get the next drop

New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.

Frequently asked questions

What are nested subagents in Claude Code?
Nested subagents let a subagent spawn its own subagents, instead of delegation being a single flat layer. Each subagent still runs in an isolated context window and returns only a summary — nesting just means a subagent that hits noisy work can offload it to a deeper subagent, keeping every level clean. Support was announced by Anthropic's Claude Code lead Boris Cherny around June 9, 2026, capped at five layers deep.
How deep can subagents nest?
The current cap is five layers. Boris Cherny presented that ceiling as a starting point and explicitly asked for feedback on whether it's too tight, so it may change. In Codex the equivalent limit is the agents.max_depth config value, which defaults to 1.
Why not just use one big context window?
Because context windows degrade as they fill — an agent reasoning over thousands of lines of logs and file dumps makes worse decisions on the edits that matter. Subagents push that noise into separate windows and return only the summary, so the main conversation stays short and high-signal. Nesting extends that benefit to every layer.
How do I enable nested subagents?
In Claude Code, update to the latest version (nesting shipped in the June 2026 rollout) and either prompt for it explicitly ('run the verification in separate subagents') or design your skills to delegate. In Codex, set max_depth above 1 under the [agents] section of your config.
When should I NOT nest subagents?
Most of the time. Each layer adds latency and token cost, so everyday edits and quick tasks should stay flat. Reach for nesting only when the work is genuinely big and noisy — research at volume, real debugging, or high-stakes decisions where missing a second-order effect is expensive.
Sources · Anthropic Just Dropped the Biggest Subagent Upgrade Yet — Ray Amjad · Create custom subagents — Claude Code Docs · Boris Cherny announces nested subagent support (depth 5) · Subagents & configuration reference — OpenAI Codex (agents.max_depth)

Orchestrating agents is a skill. Businesses pay for the result.

Look at what these patterns really teach: how to make a fleet of AI agents investigate, verify, and report — reliably, in parallel. That orchestration is exactly what local businesses want and will never build themselves. That's the opening Knotie gives you: spin up AI voice and chat agents for clients under your own brand and domain (Retell, LiveKit, VAPI, Ultravox, n8n), with a customer portal, credit billing, and your margin built in. You get good at directing agents here. Knotie lets you sell that to the people who can't.

See how Knotie works