← All Articles

The Line Claude Cannot Cross

CLAUDE.md is a letter to the model. Hooks are a law. The difference matters when a polite instruction not to delete an S3 directory has failed for the third time this quarter.

The Polite Request

Every CLAUDE.md file ever written contains some version of the same clause. Do not delete files without permission. Do not run destructive commands. Ask before doing anything irreversible. These clauses are in the documentation, the system prompt, the project conventions, sometimes all three.

They do not work. Or rather, they work most of the time, which is the same thing as not working when the failure mode you care about is the rare destructive action.

A CLAUDE.md rule is a sentence in the context window. It influences the next-token distribution. It does not constrain output. The model is free to generate text that violates the rule, and sometimes does, either because the relevant rule fell below the attention threshold at the moment of decision, or because the model reasoned its way around it, or because the guidance was ambiguous in the specific case, or because adaptive thinking allocated zero reasoning tokens on the relevant turn. Take your pick.

The polite request is real. It is not enforcement.

What Hooks Are

Claude Code exposes a hook system: user-defined commands that run at specific lifecycle events. A hook is not part of the model. It is code that the harness executes deterministically, outside the language model's control, and whose output the harness respects.

The relevant events, for safety purposes, are:

PreToolUse. Runs before every tool call. Can inspect the tool name and arguments. Can exit with a non-zero status to block the call.
Stop. Runs when the assistant signals it has finished a turn. Can emit a JSON decision that forces the assistant to continue.
PostToolUse, UserPromptSubmit, and others. Useful for logging and observability.

A hook is a shell command. It reads a JSON payload from standard input, returns a JSON decision on standard output, and the harness obeys it. The language model cannot see the hook script. It cannot disable the hook. It cannot modify ~/.claude/settings.json without going through the hook (if you have wired one to catch file-write attempts against that path). The hook is outside the model's conversational reality.

A hook is not a preference. It is a rule the model cannot see, cannot reason around, and cannot forget.

Example 1: Behavioral Correction

The first example comes from the Laurenzo GitHub issue. Her team's hook, stop-phrase-guard.sh, was published as a public gist by Ben Vanik of Google, her IREE collaborator. It is a Stop hook. It watches for roughly fifty phrase patterns that correlate with Claude giving up early ("this appears to be a pre-existing issue", "should I continue?", "known limitation") and, when it matches, forces the assistant back into the task with a counter-argument.

Installing it on a MacBook is four commands:

# 1. Install jq (the hook uses it to parse JSON).
brew install jq

# 2. Download the hook to a stable location.
mkdir -p ~/.claude/hooks
curl -L -o ~/.claude/hooks/stop-phrase-guard.sh \
  https://gist.githubusercontent.com/benvanik/ee00bd1b6c9154d6545c63e06a317080/raw/stop-phrase-guard.sh
chmod +x ~/.claude/hooks/stop-phrase-guard.sh

Then register it as a Stop hook in ~/.claude/settings.json:

↗ docs{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "$HOME/.claude/hooks/stop-phrase-guard.sh"
          }
        ]
      }
    ]
  }
}

Restart the session. The hook is live. Next time Claude attempts to finish a turn with "this looks like a pre-existing issue", the harness forces the assistant to keep working, with the message "NOTHING IS PRE-EXISTING. You own every change. Investigate the failure." injected as its next instruction.

Three things are worth noting. First, the hook will increase token usage, because every block means another turn. Second, the phrase list is project-specific; the default entries reference the author's CLAUDE.md golden rules and need to be edited to match your own. Third, the hook is the template, not the solution. The moment you run it once and see it fire, you will start wanting one for every other problem you have been losing to model drift.

Example 2: Command Interception

The second example is mine. About once a month, Claude would delete an S3 directory. Always for a reason that seemed plausible in the moment, always in violation of a CLAUDE.md rule I had already written three times. I got tired of it. I built cli-intercept.

Cli-intercept is a PreToolUse hook on the Bash tool. Every shell command the model wants to execute is piped through a bash script that extracts the command string, normalizes whitespace, and tests it against a regex denylist. On a match, the hook exits with status 2, writes to a log, and prints a reason to stderr. The harness blocks the command. The model sees the reason and picks a different path.

↗ source# S3 destructive syncs/removes
aws s3 sync .* --delete
aws s3 rm +s3://
aws s3api delete-object( |$)
aws s3api delete-bucket
aws s3 rb .* --force

# Filesystem destruction
rm -rf( +--no-preserve-root)? +/( |$)
rm -rf +~( |/|$)

# Git footguns
git push .* --force( |$)
git reset --hard +origin
git branch -D +(main|master)

# Database destruction
DROP DATABASE
TRUNCATE TABLE

# Infra destruction
terraform destroy( |$)
kubectl delete +namespace

Representative entries from the cli-intercept denylist. One regex per line, tested against the normalized Bash command.

The design choice worth explaining is: regex, not an LLM. A reviewer could reasonably ask why a deterministic regex list is better than a second model call that assesses whether the command is dangerous. The answer is in the failure mode you are defending against. The failure is a language model making a bad decision. Asking another language model to gatekeep reintroduces the same class of failure. A regex list is auditable, offline, zero-cost, and has exactly the semantics its author wrote. A gatekeeper model is none of those things.

The Advantage of Customization

CLAUDE.md and skills are shared artifacts. They live in the repository. They describe what the project wants. This is valuable but it has a ceiling. Project conventions are what the project wants; user hooks are what the user enforces. The distinction matters.

A senior engineer running Claude Code on infrastructure code has a different safety profile than a junior developer running it on unit tests. The senior engineer wants strict denylists around aws s3 rm, terraform destroy, and kubectl delete namespace. The junior developer wants none of these rules, because their work never touches those commands, and false positives would be annoying. Both people are working on the same project. Neither CLAUDE.md configuration serves them both.

Hooks are per-user. They live in ~/.claude/settings.json, not in the repo. Each team member configures the safety layer appropriate to their role and their risk tolerance. The repo's CLAUDE.md still exists for project conventions; the user's hooks add the personal enforcement layer. This is the right separation of concerns. It is also the layer that department policy can plausibly mandate.

What Hooks Cannot Stop

Hooks are not foolproof. The most important thing a department head can internalize is exactly how they fail.

A regex denylist matches literal patterns. The model can evade a pattern by transforming the command. If the denylist blocks aws s3 rm, a sufficiently motivated model can write a Python script that imports boto3 and calls delete_object in a loop, then execute the Python script. The hook sees python cleanup.py, not aws s3 rm. The pattern does not match. The command runs.

The model can evade a pattern by writing a script that evades the pattern, then executing the script.

This is not hypothetical. It is the generalized version of the eval problem in web security. Any system that inspects surface syntax can be bypassed by code that generates the dangerous syntax at runtime. Defense in depth means assuming the first layer will fail and building a second layer.

The hook layer is still worth having. Most failures are not malicious evasion; they are ordinary mistakes the model makes because it is not thinking carefully. A denylist catches the ordinary mistakes. For the non-ordinary cases, you need the same thing every other security discipline already knows you need: least-privilege credentials, blast-radius limits, and logs that let you reconstruct what happened.

For Department Heads

The practical implication for anyone responsible for safeguarding AI usage inside an organization:

CLAUDE.md is not a safety layer. It is documentation. Treat it as such. Use it for what it is good at (explaining project conventions) and do not rely on it to prevent destructive actions.
Ship a standard hook configuration. Every engineer using Claude Code in your organization should have a baseline PreToolUse denylist on the Bash tool. Publish it, version it, update it. cli-intercept is one pre-built option. The specific list matters less than the fact that there is a list.
Pair hooks with IAM. Any credential that grants destructive capability should be scoped, logged, and reviewable. Hooks close the ordinary-mistake failure mode. IAM closes the escape-hatch failure mode. Neither alone is sufficient.
Audit what fires. A hook that has never fired is either perfectly designed or never exercised. Log every block to a central location. Review the logs. Patterns in what the model tried to do will tell you what the next rule needs to catch.
Teach hooks as a discipline, not a toggle. The engineers who will do the best job of this are the ones who understand the failure model, not the ones who copy-pasted a denylist. The distinction between "the model is wrong sometimes" and "the model is architecturally incapable of refusing a task it has been asked to do" is the distinction between treating hooks as a preference and treating them as a policy.

Hooks are a superpower. They are also not foolproof. Every practitioner deploying an agentic AI system in production needs to understand both halves of that sentence. The polite request fails in the long tail. Deterministic enforcement closes most of the tail. What remains is the work security engineers have been doing for fifty years: credentials, logging, and the assumption that any component can fail.

. . .

References

Anthropic. "Hooks Reference." Claude Code Documentation, 2026, code.claude.com/docs/en/hooks.
Vanik, Ben. "stop-phrase-guard.sh." GitHub Gist, 2026, gist.github.com/benvanik/ee00bd1b6c9154d6545c63e06a317080.
Trim, Craig. "cli-intercept: PreToolUse hook that gates Claude Code's Bash tool against a regex denylist." GitHub, 2026, github.com/craigtrim/cli-intercept.
Trim, Craig. "Unexpected Model Behavior." craigtrim.com, 2026, craigtrim.com/articles/unexpected-model-behavior/.
Laurenzo, Stella. "[MODEL] Claude Code is unusable for complex engineering tasks with the Feb updates." GitHub Issue #42796, anthropics/claude-code, 2026, github.com/anthropics/claude-code/issues/42796.

Claude Code Agents Safety