Why Claude tells you to walk to the car wash
Andrej Karpathy — Tesla's former AI director — uses a small question to expose what today's models miss. Ask any of them: "The car wash is 50 metres away. Should I drive or walk?" Claude, Gemini, Grok, ChatGPT all say walk — it's close. They miss the obvious: you need the car at a car wash. Karpathy's framing for this is the part worth keeping. In his words, the difference between old software and AI is verifiability: "traditional computers automate what you can specify in code; LLMs automate what you can verify." Models are sharp where there's a clear signal to check against, and they get shaky where the context lives only in your head. So the job is simple to state: get what's in your head into a form the model can check. That's what the three layers below do. (Heads up on attribution: the Spec/Verifier/Environment framing is a clean way to apply Karpathy's ideas — context engineering and the verifiability thesis are his; the three-layer packaging is the teaching scaffold.)
- Models excel at what's measurable; they fumble context that only you hold
- Karpathy's verifiability thesis: code automates what you can SPECIFY; LLMs automate what you can VERIFY
- So the work isn't 'better prompts' — it's getting your understanding in, and giving the model something to check against
Layer 1 — The Spec: stop prompting, start specifying
Karpathy is lukewarm on plain plan mode — useful, but he'd rather work WITH the agent to design a genuinely detailed spec. A spec is just your understanding written down in a form Claude can build from. Don't dump one giant brief; bias toward small, compartmentalized specs you refine as you go. And when Claude drafts the spec for you, read it like an engineer, not a customer — that's the step everyone skips.
- Start in plan/spec mode, but go deeper than the default — make Claude interview you to pull the goal out of your head
- Keep specs SMALL and compartmentalized (one slice at a time), not one monster document
- Force a decision check — paste: "Before we build, restate the spec and make me verify each key decision explicitly, so nothing important is assumed."
- Read the spec critically yourself — if a line is vague to you, it's vaguer to the model
Layer 2 — The Verifier: the lever almost nobody pulls
This is the one to actually get good at. The most painful part of working with AI is checking its output — so most people eyeball it and hope. Karpathy's point: verification is the real lever. The trick is to make the model help verify itself, and to give it something concrete to verify against. Three moves do most of the work.
- Set the bar up front: "Outline the evaluation criteria you'll use to judge a high-quality final product, then build to them" (kills vague asks like 'make it look good')
- Feed it real reference: pull in a past report / the actual schema / a known-good example so 'correct' has a definition
- Wire it to reality where you can: let Claude hit the deployed system or your data to confirm the thing actually worked, not just that it 'looks right'
- Bundle it: end build prompts with "Add a verification step that checks the output against the criteria and the reference before you call it done."

Layer 3 — The Environment: make good behavior automatic
Layers 1 and 2 need somewhere to live, or you'll re-type them every session. That's the environment: your CLAUDE.md, your rules, your guardrails. Think of a workshop. The spec is the blueprint on the wall. The verifier is the quality-check station by the door. The environment is the room that makes both happen without you asking.
- Bake verification into CLAUDE.md: "Before anything multi-step, include a verification plan" — now it's enforced, not remembered
- Sort actions into three buckets: ALWAYS-DO (run on autopilot), ASK-FIRST (double-check), NEVER-DO (hard lines)
- Enforce the hard lines at the tool level, not the prompt level: a pre-tool-use hook that blocks edits to protected files means Claude literally can't cross the line (a prompt rule is a request; a hook is a wall)
- Audit it occasionally — ask Claude to review your CLAUDE.md and flag rules that are vague or unenforced
The one thing to actually get good at
Asked what's still worth learning deeply when intelligence gets cheap, Karpathy's answer was sharp: "You can outsource your thinking, but you can't outsource your understanding." Every layer here is just a way to keep your understanding in the loop. The spec puts it on paper. The verifier holds the model to it. The environment defends it when you're not looking. Tools get faster every month. The edge that lasts is being the person who understands the problem well enough to point them at the right thing.
- Spec = your understanding, made explicit
- Verifier = your standard, made checkable
- Environment = your intent, made automatic
- The compounding skill isn't prompting — it's understanding the problem deeply enough to steer
Get the next drop
New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.