Karpathy's Method: How to Build 10x Faster with Claude (Spec → Verifier → Environment)

Why Claude tells you to walk to the car wash

Andrej Karpathy — Tesla's former AI director — uses a small question to expose what today's models miss. Ask any of them: "The car wash is 50 metres away. Should I drive or walk?" Claude, Gemini, Grok, ChatGPT all say walk — it's close. They miss the obvious: you need the car at a car wash. Karpathy's framing for this is the part worth keeping. In his words, the difference between old software and AI is verifiability: "traditional computers automate what you can specify in code; LLMs automate what you can verify." Models are sharp where there's a clear signal to check against, and they get shaky where the context lives only in your head. So the job is simple to state: get what's in your head into a form the model can check. That's what the three layers below do. (Heads up on attribution: the Spec/Verifier/Environment framing is a clean way to apply Karpathy's ideas — context engineering and the verifiability thesis are his; the three-layer packaging is the teaching scaffold.)

Models excel at what's measurable; they fumble context that only you hold
Karpathy's verifiability thesis: code automates what you can SPECIFY; LLMs automate what you can VERIFY
So the work isn't 'better prompts' — it's getting your understanding in, and giving the model something to check against

Layer 1 — The Spec: stop prompting, start specifying

Karpathy is lukewarm on plain plan mode — useful, but he'd rather work WITH the agent to design a genuinely detailed spec. A spec is just your understanding written down in a form Claude can build from. Don't dump one giant brief; bias toward small, compartmentalized specs you refine as you go. And when Claude drafts the spec for you, read it like an engineer, not a customer — that's the step everyone skips.

Start in plan/spec mode, but go deeper than the default — make Claude interview you to pull the goal out of your head
Keep specs SMALL and compartmentalized (one slice at a time), not one monster document
Force a decision check — paste: "Before we build, restate the spec and make me verify each key decision explicitly, so nothing important is assumed."
Read the spec critically yourself — if a line is vague to you, it's vaguer to the model

Layer 2 — The Verifier: the lever almost nobody pulls

This is the one to actually get good at. The most painful part of working with AI is checking its output — so most people eyeball it and hope. Karpathy's point: verification is the real lever. The trick is to make the model help verify itself, and to give it something concrete to verify against. Three moves do most of the work.

Set the bar up front: "Outline the evaluation criteria you'll use to judge a high-quality final product, then build to them" (kills vague asks like 'make it look good')
Feed it real reference: pull in a past report / the actual schema / a known-good example so 'correct' has a definition
Wire it to reality where you can: let Claude hit the deployed system or your data to confirm the thing actually worked, not just that it 'looks right'
Bundle it: end build prompts with "Add a verification step that checks the output against the criteria and the reference before you call it done."

An AI output being checked against a glowing checklist and gauge — the verification layer

Layer 3 — The Environment: make good behavior automatic

Layers 1 and 2 need somewhere to live, or you'll re-type them every session. That's the environment: your CLAUDE.md, your rules, your guardrails. Think of a workshop. The spec is the blueprint on the wall. The verifier is the quality-check station by the door. The environment is the room that makes both happen without you asking.

Bake verification into CLAUDE.md: "Before anything multi-step, include a verification plan" — now it's enforced, not remembered
Sort actions into three buckets: ALWAYS-DO (run on autopilot), ASK-FIRST (double-check), NEVER-DO (hard lines)
Enforce the hard lines at the tool level, not the prompt level: a pre-tool-use hook that blocks edits to protected files means Claude literally can't cross the line (a prompt rule is a request; a hook is a wall)
Audit it occasionally — ask Claude to review your CLAUDE.md and flag rules that are vague or unenforced

The one thing to actually get good at

Asked what's still worth learning deeply when intelligence gets cheap, Karpathy's answer was sharp: "You can outsource your thinking, but you can't outsource your understanding." Every layer here is just a way to keep your understanding in the loop. The spec puts it on paper. The verifier holds the model to it. The environment defends it when you're not looking. Tools get faster every month. The edge that lasts is being the person who understands the problem well enough to point them at the right thing.

Spec = your understanding, made explicit
Verifier = your standard, made checkable
Environment = your intent, made automatic
The compounding skill isn't prompting — it's understanding the problem deeply enough to steer

Get the next drop

New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.

Frequently asked questions

What is Karpathy's method for using Claude?

A practical way to apply how Andrej Karpathy works with AI: write a detailed spec (your understanding, in a form Claude can build from), add a verifier (criteria + references so the model checks its own output), and set up the environment (CLAUDE.md and hooks that make both automatic). The labels are a teaching scaffold; the underlying ideas — context engineering and verification — are Karpathy's.

Is this just Claude's plan mode?

No. Karpathy has said he finds plain plan mode useful but limited — the move is to work with the agent to co-design a detailed, compartmentalized spec and then verify key decisions, rather than accept the first plan it produces.

What is 'context engineering'?

Karpathy's preferred term over 'prompt engineering': the art of filling the context window with exactly the right information for the next step. It reframes the work from 'clever wording' to 'supplying the right context and references.'

What's the verifiability thesis / the car wash example?

Karpathy's framing that 'traditional computers automate what you can specify in code; LLMs automate what you can verify.' The car-wash question (should you drive or walk 50m to a car wash?) shows models stumbling where the context isn't verifiable — which is why the verifier layer matters.

Do I need to be a coder to use this?

No. Specs, evaluation criteria, and the always/ask-first/never buckets are written in plain language. The only mildly technical bit is a CLAUDE.md file and an optional pre-tool-use hook for hard rules — both are copy-paste once you've seen them.

Sources · Stop Prompting Claude. Use Karpathy's Method Instead — Austin Marchese · Andrej Karpathy on 'context engineering' over 'prompt engineering' (X) · Context Engineering — handbook inspired by Karpathy (GitHub) · Claude Code — CLAUDE.md & hooks (Anthropic docs)

Karpathy's Method: How to Build 10x Faster with Claude (Spec → Verifier → Environment)

Why Claude tells you to walk to the car wash

Layer 1 — The Spec: stop prompting, start specifying

Layer 2 — The Verifier: the lever almost nobody pulls

Layer 3 — The Environment: make good behavior automatic

The one thing to actually get good at

Get the next drop

Frequently asked questions

Verified agents are worth money. Most businesses can't build them.

Grab the AI Reseller Starter Kit

Karpathy's Method: How to Build 10x Faster with Claude (Spec → Verifier → Environment)

Why Claude tells you to walk to the car wash

Layer 1 — The Spec: stop prompting, start specifying

Layer 2 — The Verifier: the lever almost nobody pulls

Layer 3 — The Environment: make good behavior automatic

The one thing to actually get good at

Get the next drop

Frequently asked questions

More free guides

Verified agents are worth money. Most businesses can't build them.

Grab the AI Reseller Starter Kit