Why your main chat keeps getting dumber
Here's the thing nobody warns you about: an AI coding agent gets worse the longer you talk to it. Not because the model is weak — because its context window fills with noise. Every file it reads, every search it runs, every test log it dumps stays in the conversation, crowding out the decisions that actually matter. By the time it goes to edit code, it's reasoning over a haystack. Subagents are the fix. A subagent does the noisy work — searching the codebase, reading docs, checking whether a library is sane — off in its own separate context window, then hands back only the summary. The important reasoning happens early, while the main window is still lean and high-signal. Claude Code already does this on its own for big jobs. The upgrade is what those subagents can now do themselves.
- A full context window = worse decisions on the edits and commands that count
- A subagent runs noisy work in an isolated window and returns only the summary
- So the main conversation stays short, sharp, and high-signal
What actually changed: subagents can now nest
Until now there was a hard floor: a subagent could not spawn its own subagents. Delegation was one level deep and flat. Around June 9, 2026, Anthropic's Claude Code lead, Boris Cherny, announced nested subagent support — a subagent can now delegate to a deeper subagent, and so on, capped at five layers. (Worth being precise: this is a fresh rollout. The static Claude Code subagents doc still reads the old way, and Cherny framed the depth-5 cap as a starting point he wants feedback on — so treat 'five' as today's ceiling, not gospel.) Why does the ceiling matter? Because each layer gets its own clean window. A layer-1 subagent that hits something messy can offload it to layer 2 instead of polluting its own context — so the answer that bubbles back up to your main session is clean at every level. Codex shipped the same idea earlier as a config knob: agents.max_depth, which defaults to 1.
- Old rule: subagents could NOT spawn subagents (one flat layer)
- New: nested delegation up to 5 layers deep — each layer its own fresh window
- Attributed to Boris Cherny (Claude Code lead, Anthropic), ~Jun 9 2026 — early rollout, cap may move
- Codex equivalent: set agents.max_depth above 1 in config (default is 1)
The mental model: fan out, then collapse
Forget 'layers' for a second and picture a funnel turned on its side. Your main session asks one big question. It fans that question out into parallel subagents — one per chunk of work. Each of those, if it hits something noisy, fans out AGAIN into its own helpers. Work explodes outward, runs in parallel, then collapses back inward: deep results summarize up to their parent, parents summarize up to yours. You spend almost none of your main context, and you get an answer that was actually investigated in depth instead of skimmed. That single shape — fan out, then collapse — is what every pattern below is built on.
- Fan out: split the job so each subagent owns one clean slice
- Go deeper only where it's noisy: let a slice spawn its own helpers
- Collapse: every layer returns a summary up to its parent, not raw output
- Net effect: deep investigation, tiny main-context cost
Pattern 1 — Research & claim-verification fan-out
You have a folder of articles, or ten sources, and you want the big claims actually checked — not vibe-checked. Hand one source to each layer-1 subagent. Its job: extract the claims worth checking. For each claim that matters, it spawns a layer-2 subagent to do the noisy verification — search your knowledge base, hit the web, cross-reference. All sources run in parallel; all the messy searching is buried in layer 2. What comes back to you is a short list: claim, verdict, evidence. This is the single most useful pattern for anyone doing research, due diligence, or fact-checking at volume.
- Prompt: 'Load each article one per subagent. Extract the biggest claims.'
- Then: 'For each important claim, spawn a subagent to verify it against my notes and the web.'
- Let it run all sources in parallel — the searching stays in the deeper layer
- You receive a clean claim → verdict → evidence table, not a wall of search results
Pattern 2 — The debugging cause-tree
This is the most technical payoff, and the one that catches bugs a flat agent misses. A layer-1 subagent loads a fresh incident and lists the plausible causes. It fans those out: one layer-2 subagent per suspected cause, each searching the codebase for evidence. When one of them smells something deeper, it spawns a layer-3 subagent to cross-check the server logs. Why this beats one big context window: shallow debugging finds the first thing that looks broken and stops. The cause-tree keeps a separate clean window for the hunch that there's a deeper root cause underneath the obvious symptom — so it surfaces the real bug, not just the first one. Then the whole tree collapses back and you fix it in your main session, which has barely been touched.
- Layer 1: 'Load this incident, list every plausible cause.'
- Layer 2: one subagent per cause, each hunts the codebase for evidence — in parallel
- Layer 3: any cause that looks deeper spawns a check against the logs / live system
- The tree collapses to a single root cause; you fix it without burning your main context

Pattern 3 — Second & third-order effects (blast radius)
Big decisions fail on the consequences nobody traced. Say you're working through ten contracts, or ten files, toward one goal. Give each one its own subagent to consider the direct implications. Then go a layer deeper: a layer-2 subagent measures the blast radius — if we change THIS, what else has to change? Any noisy digging it needs (old email threads, related clauses, dependent code) it pushes down to a layer-3 subagent. You're now reasoning about second- and third-order effects in parallel, across every item, without holding all of it in one head. For anything high-stakes, this routinely uncovers consequences a single pass would have shipped right past.
- One subagent per item (contract / file / decision) → direct implications of the goal
- Layer 2: 'measure the blast radius — what else must change if we do this?'
- Layer 3: delegate the noisy lookups (history, dependencies) so layer 2 stays clean
- Use it before any irreversible change — it's a cheap way to see around corners
Pattern 4 — Compare skills head-to-head, and improve them overnight
Two more moves that only work now that agents can nest. First, head-to-head comparison: load Skill A into one layer-1 subagent and Skill B into another, in parallel. Each spawns its own helpers to actually do the job — say, two competing code-review skills, each running its own review subagents — then you ask the main session which results are better. You get an honest, run-it-for-real verdict instead of reading two READMEs. Second, optimization loops: have a subagent run a skill, spawn helpers to grade the output, then rewrite the skill and run it again — version after version, overnight if you let it — until it's reliably better. If it's a skill you use every day, that's some of the best token spend you'll find.
- Compare: 'Run Skill A and Skill B in separate subagents, then tell me which output is better and why.'
- Each skill spawns its own subagents to do the real work — a fair fight, not a spec sheet
- Optimize: run skill → grade output in subagents → rewrite skill → repeat
- Point it at a skill you run daily; let it iterate to a consistently better version
How to switch it on (and when NOT to)
In Claude Code: update to the latest version — nested subagents arrived in the June rollout. The smartest models will start reaching for a deeper subagent on their own when a job needs it, but today you'll often get more by asking for it explicitly ('do the verification in separate subagents') or by designing your skills to delegate. In Codex: open your config and, under [agents], set max_depth to the number of levels you want (it defaults to 1, which blocks grandchildren). The honest caveat: nesting is not free. Every layer adds latency and tokens, and most quick tasks should stay flat. Reach for depth when the work is genuinely big and noisy — research at volume, real debugging, high-stakes decisions — and keep everyday edits to a single context.
- Claude Code: update to latest; prompt nesting explicitly or bake it into your skills
- Codex: in config under [agents], raise max_depth above its default of 1
- Default to flat — only nest when the job is big and noisy enough to earn it
- Watch latency and token cost: depth is a power tool, not a default setting
Get the next drop
New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.