I had a Skill that handled K8s deployments. The flow ran for months without incident. Then one day I went back to look at how the “confirm” step was actually written — a single line of
Confirm with user before proceeding, with no mechanism guaranteeing the model would stop.That didn’t sit well with me.
The starting point: one line of natural language
The deployment Skill’s flow was: refresh token → dry-run → confirm → deploy.
The “confirm” step looked like this:
Step 4: Confirm with user before proceeding
Most of the time Claude would stop and ask if I wanted to continue. But “most of the time” and “guaranteed” are two different things. That line is natural language. The model will “try” to follow it during token generation, but there’s no runtime mechanism ensuring it actually stops.
I decided to go through every Skill I had and see how many had the same problem.
Auditing 35 Skills
I had two sets of Skills — 14 shared repo skills and 21 personal skills. After reading through all of them, I first sorted by whether they had destructive operations:
| Type | Count | Examples |
|---|---|---|
| Read-only / Advisory | 21 | Log analysis, code review, status checks |
| Has destructive operations | 14 | Deployments, git push, config changes, device commands |
Then I looked at how those 14 destructive Skills handled “confirm before executing”:
| Approach | Count |
|---|---|
| Nothing at all | 8 |
Natural language (CHECKPOINT, STOP, Confirm with user) | 5 |
| Specifies calling AskUserQuestion tool | 1 |
14 Skills with destructive operations. 8 with no checkpoint whatsoever. 5 relying on a line of natural language.
This isn’t just my setup. Search GitHub for public Claude Code Skills and you’ll find the same pattern everywhere — natural language signposts:
- The claude-code-starter-kit incident-response skill puts it right in the behavioral rules:
**STOP at checkpoints** — wait for user confirmation before proceeding, with each phase ending in**CHECKPOINT**: Present triage summary. Wait for user to confirm before investigation. - The claude-code-ultimate-guide talk-pipeline skill uses a
CHECKPOINTstep withDo not invoke Stage 5 without explicit user confirmation, and its anti-patterns section warns against “Skipping the CHECKPOINT — it’s the pipeline’s most important control point” - awesome-claude-skills curates 50+ verified skills — I went through them, and not a single one uses a runtime mechanism for checkpoints
Whether you call it CHECKPOINT, STOP, WAIT, or Confirm with user, it’s the same thing: a line of natural language, hoping the model reads it and stops.
But these signposts are not 100%. GitHub Issue #18454 documents a case where a user wrote ⛔ MANDATORY SESSION START (DO NOT SKIP) and Wait for confirmation before proceeding in their CLAUDE.md — bold, emoji, all-caps — and the model acknowledged reading it, then completely ignored it, modifying 23 files in one go.
What about the one that used AskUserQuestion? It was a sprint planning skill that called AskUserQuestion after listing stories for the user to confirm. Written like this:
Use AskUserQuestion to confirm the stories:
- Question: "Are these stories correct?"
- Options:
- "Correct, proceed" → Continue to next step
- "Need changes" → Ask what to modify
My first reaction: “That’s the right approach. AskUserQuestion is a tool call. Once invoked, the runtime forcibly pauses generation and waits for user response. This is a hard constraint.”
Two weeks of testing later, I found that conclusion was only half right.
AskUserQuestion isn’t as hard as you’d think
Worth pausing to consider: the model deciding whether to invoke AskUserQuestion and deciding whether to obey CHECKPOINT/WAIT/STOP use the same mechanism — token generation.
CHECKPOINT/WAIT: Probabilistic compliance → outputs text and waits
AskUserQuestion: Probabilistic invocation → (if invoked) runtime forces a block
The second step is genuinely deterministic — once the tool call fires, the runtime pauses generation, presents UI, and waits for user selection. This is backed by official documentation:
Execution remains paused until your callback returns, and the SDK only cancels the wait when the query itself is cancelled.
But the first step? The model “deciding whether to issue the tool call” is probabilistic in exactly the same way as “deciding whether to obey CHECKPOINT.”
This isn’t speculation. GitHub Issue #19308 has a title that says it directly:
Claude systematically ignores Skill tool despite explicit BLOCKING REQUIREMENT instructions
All-caps bold “you MUST call this tool” in the Skill, and the model skips it anyway.
So is AskUserQuestion better than plain natural language? Yes — it adds a runtime protection layer. But is it 100%? No. The difference between the two is single-layer (pure probability) vs two-layer (probability + deterministic), not “soft vs hard.”
So what’s actually 100%?
After going through the official docs, I found three mechanisms that don’t depend on the model “deciding to comply” — they operate at the runtime layer, and the model can’t bypass them.
PreToolUse Hook
This is the strongest one. Hooks intercept tool calls before execution. You inspect the command and decide to allow or block:
// .claude/settings.json
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": ["bash .claude/hooks/block-destructive.sh"]
}
]
}
}
# .claude/hooks/block-destructive.sh
INPUT=$(cat /dev/stdin)
COMMAND=$(echo "$INPUT" | jq -r '.tool_input.command // empty')
if echo "$COMMAND" | grep -qE "git push|kubectl apply|kubectl delete|rm -rf"; then
cat <<EOF
{
"decision": "block",
"reason": "Blocked: $COMMAND — run manually if intended."
}
EOF
exit 0
fi
According to the official docs, PreToolUse Hooks block even bypassPermissions mode:
A hook that returns
permissionDecision: "deny"blocks the tool even inbypassPermissionsmode or with--dangerously-skip-permissions.
Model tries git push? Hook intercepts before the shell executes. The model can’t get around it because the whole thing happens outside the model’s control.
Skill splitting
Split one long-flow Skill into two independent Skills:
Before: /deploy → dry-run → confirm → deploy → verify
After: /deploy-prepare → dry-run → output results
/deploy-execute → user manually triggers → deploy → verify
The user has to type /deploy-execute themselves. The model won’t trigger a user-invocable Skill on its own — that’s a runtime guarantee.
disallowed-tools
---
name: log-analyzer
disallowed-tools:
- Edit
- Write
- Bash
---
disallowed-tools removes specified tools from the model’s available tool pool for the duration of the Skill. The model can’t see these tools, so it won’t call them. The caveat: the restriction clears after the user’s next message. Good enough for analysis Skills, not enough for deployment Skills.
One thing that’s easy to confuse here: allowed-tools is not a restriction. The official docs are explicit that it only grants permission (pre-approves tools), and does not prevent the model from calling tools outside the list. I got this backwards initially and only corrected it after checking the docs.
The three-layer protection model
After sorting through everything, all mechanisms for “making the model stop before destructive operations” fall into three layers:
| Layer | Approach | Depends on model compliance | Reliability |
|---|---|---|---|
| Natural language | CHECKPOINT, WAIT, STOP, Confirm | 100% dependent | Probabilistic |
| Tool call instruction | Use AskUserQuestion | Invocation decision dependent, execution independent | Probability + deterministic |
| Runtime mechanism | Hooks, Skill splitting, disallowed-tools | 0% dependent | 100% |
Back to that original K8s deployment Skill. Here’s the corrected protection:
- Hook intercepts
kubectl apply— model can try all it wants, it won’t execute - AskUserQuestion presents options before deploy — the normal-flow UX
- Natural language
IMPORTANT: Never deploy without approval— the last soft line of defense
The primary defense is the Hook. Even if AskUserQuestion gets skipped (Issue #19308 confirms this happens), kubectl apply is still blocked by the Hook. AskUserQuestion’s value isn’t security — it’s providing a better user experience (the selection UI).
Decision framework
Does the Skill have irreversible operations?
│
├── No → No checkpoint needed
│ Optional: disallowed-tools to remove write tools
│
└── Yes → Runtime layer as primary defense
├── Hook to intercept dangerous commands (most flexible)
├── Skill splitting: prepare + execute (simplest)
└── Optional: layer AskUserQuestion on top for UX
Looking back
After going through 35 Skills, reading the official docs, and combing through GitHub Issues, the biggest takeaway wasn’t “discovering the three-layer model.” It was realizing that my intuition about LLM control mechanisms was wrong.
I assumed “telling the model to call a tool” was more reliable than “telling the model to stop and wait.” Sounds reasonable — tool calls have runtime protection, natural language doesn’t. But “the model deciding whether to call the tool” is itself probabilistic, using the same mechanism as “the model deciding whether to obey CHECKPOINT.”
One sentence: if a behavior’s safety depends on the model “deciding to comply” with your instruction, it’s not 100%. 100% only exists in mechanisms outside the model’s control.
References
- Claude Code Hooks Guide — Official docs on PreToolUse Hooks
- Claude Code Skills — Skill frontmatter definitions (
allowed-tools,disallowed-tools) - Handle approvals and user input — AskUserQuestion’s blocking behavior
- GitHub Issue #19308 — Model ignoring explicit tool call instructions in Skills
- GitHub Issue #18454 — Model ignoring MANDATORY natural language checkpoints in CLAUDE.md and Skills
- claude-code-starter-kit incident-response — Public Skill using
STOP at checkpointspattern - claude-code-ultimate-guide talk-pipeline — Public Skill using
CHECKPOINTpattern
This is part of the “Claude Code in Practice” series. Previous: Git as an External Brain for Claude Code: Beyond MEMORY.md.