Claude Code Skill 安全性：從「拜託你停下來」到「你根本動不了」

38 個 Skills、三層防護、一個血淚教訓：自然語言指令不是安全機制。

TL;DR

我有 38 個 Claude Code Skills（24 personal + 14 repo），審計後發現：12 個破壞性 Skill 沒有任何 checkpoint，10 個只靠自然語言「請確認後再繼續」。本文記錄我如何用三層防護模型系統性地修補這個問題。

核心觀點： 自然語言指令是「告示牌」，不是「物理屏障」。你不會在懸崖邊只放一塊「請勿靠近」的牌子。

問題：你的 Skill 真的安全嗎？

Claude Code Skills 是可重複的工作流程模板。寫得好的 Skill 可以把 10 分鐘的操作壓縮到 30 秒。但問題在於——很多 Skill 包含不可逆操作：

git push 到遠端
aws s3 rm 刪除 S3 物件
kubectl delete 砍 K8s 資源
adb push 覆蓋系統 APK
--execute --yes 寫入 production DB

我對自己的 38 個 Skills 做了一次完整審計：

分類	數量	問題
破壞性，無 checkpoint	5	`bads-skynet-e2e`, `device-test`, `commit-phase`, `test-folio`, `ventura-memory`
破壞性，只有自然語言	10	`bads-update`, `self-evolution`, `phase-impl`, `writing`, `code-review` 等
破壞性，有 AskUserQuestion	3	`prepare-feature-for-ux-review`, `sprint-plan`, `release-notes`
唯讀	17	`aosp-analysis`, `director`, `trade-analysis` 等

最危險的發現： 我的 bads-update Skill 從 token 刷新到 production DB 寫入再到 S3 清理，一條龍執行 8 個步驟。中間只有一句「Confirm with user before proceeding」。在 long context 下，模型可能直接跳過這句話繼續執行。

三層防護模型

我把安全機制分成三層，按可靠性遞增排列：

Layer 1 (告示牌)     → 自然語言：「等待用戶確認」
Layer 2 (紅綠燈)     → AskUserQuestion tool call
Layer 3 (物理屏障)   → PreToolUse Hook / Skill 拆分 / disallowed-tools

Layer 1：自然語言 — 「拜託你停下來」

**→ WAIT** for user response before proceeding.
Confirm with user before proceeding.
要我執行嗎？

這是告示牌。模型看到了，權衡所有指令後決定要不要遵守。在以下情況下遵守率會下降：

Long context 稀釋：對話越長，早期指令權重越低
Task completion bias：模型傾向完成任務而非停下來
措辭模糊：「Confirm with user」vs「Use AskUserQuestion tool」

結論：Layer 1 只能當最後防線，永遠不要把它當主防線。

Layer 2：AskUserQuestion — 「紅綠燈，但可能壞掉」

Use AskUserQuestion to confirm:
> "Push to origin refs/for/main? (Change-Id: I1234)"
Only proceed after user confirms.

AskUserQuestion 是一個 tool call 指令。如果模型決定調用它，runtime 會強制阻斷等待用戶回應。但第一步——「模型決定調用」——仍然是概率性的。

GitHub Issue #19308 證實：模型會忽略 Skill 中明確的 tool call 指令。

概率性調用 → (如果調用了) → runtime 強制阻斷
     ↑                           ↑
  可能失敗                     100% 可靠

比 Layer 1 多了 runtime 阻斷的第二層，但入口仍是概率性的。

Layer 3：Runtime 機制 — 「你根本動不了」

3a. PreToolUse Hook

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "if": "Bash(git push*)",
            "command": "/path/to/safety-hook.sh",
            "timeout": 5
          }
        ]
      }
    ]
  }
}

Hook 在 tool call 執行前攔截。模型無法繞過。if field 確保只在匹配到危險 pattern 時才 spawn hook process，不影響正常命令的效能。

Hook 腳本讀取 stdin 的 JSON，檢查 command 內容，返回 permissionDecision：

#!/usr/bin/env bash
INPUT=$(cat)
COMMAND=$(echo "$INPUT" | python3 -c "
import json,sys
print(json.load(sys.stdin).get('tool_input',{}).get('command',''))
" 2>/dev/null)

# DENY: hard block
if echo "$COMMAND" | grep -qE 'git\s+push\s+.*(--force|-f)\b'; then
  echo '{"hookSpecificOutput":{
    "hookEventName":"PreToolUse",
    "permissionDecision":"deny",
    "permissionDecisionReason":"Force push rewrites remote history."
  }}'
  exit 0
fi

# ASK: prompt user
if echo "$COMMAND" | grep -qE 'git\s+push\s'; then
  echo '{"hookSpecificOutput":{
    "hookEventName":"PreToolUse",
    "permissionDecision":"ask",
    "permissionDecisionReason":"Git push detected. Confirm target branch."
  }}'
  exit 0
fi

# Passthrough
exit 0

三種決策：

Decision	效果	適用場景
`deny`	硬擋，用戶必須重新授權	Force push、S3 刪除、production DB 寫入
`ask`	彈出確認，可以繼續	一般 push、device reboot
(empty)	直接通過	正常命令

3b. Skill 拆分

/bads-update          → Steps 1-4 (dry-run only)
/bads-update-execute  → Steps 5-8 (execute + verify + cleanup)

用戶必須手動輸入 /bads-update-execute。Runtime 層級保證——模型不可能自動調用一個需要用戶手動輸入的 Skill。

3c. `disallowed-tools`

---
name: aosp-analysis
disallowed-tools:
  - Edit
  - Write
---

從模型可用工具池中移除寫入工具。Read-only Skill 不需要能修改檔案。

實施方案

Phase 1：PreToolUse Hook

建立一個通用 hook 腳本，配置 6 個 if filter：

Pattern	Decision	原因
`git push --force`	deny	重寫遠端歷史
`aws s3 rm`	deny	刪除 S3 物件不可逆
`kubectl delete` (非 tunnel cleanup)	deny	K8s 資源刪除
`--execute --yes`	deny	Production DB 寫入
`git push`	ask	確認目標分支
`adb reboot`	ask	設備重啟
`docker compose down`	ask	停止服務

if field 是關鍵——它使用 permission rule syntax 過濾，確保 hook process 只在匹配時 spawn。不會影響 ls、git status、./gradlew 等正常命令。

Phase 2：拆分最危險的 Skill

bads-update 是我最危險的 Skill：

Before:

Steps 1-8: token → DB tunnel → dry-run → execute → verify → S3 cleanup
中間只有一句 "Confirm with user before proceeding"

After:

/bads-update (Steps 1-4):
  token → DB tunnel → dry-run → STOP
  "Dry-run complete. Run /bads-update-execute when ready."

/bads-update-execute (Steps 5-8):
  execute → verify → S3 cleanup
  Hook 攔截 --execute --yes 和 aws s3 rm

雙重保護：Skill 拆分（用戶必須手動輸入）+ Hook 攔截（即使在 execute skill 裡也會 block）。

Phase 3：AskUserQuestion 加到破壞性 Skill

Skill	Checkpoint 位置	問題
`commit-phase`	Push 前	”Push to {remote} refs/for/{branch}?”
`device-test`	APK push 前	”Push {apk} to {device}:/product/priv-app/?”
`self-evolution`	修改前	”Apply these changes to CLAUDE.md / skills?”
`writing`	寫入前	”Write this content to {filepath}?”

Phase 4：`disallowed-tools` 加到 17 個唯讀 Skill

一行 frontmatter 搞定。讓分析類 Skill 不能修改檔案。

驗證

用 Python subprocess 測試 hook 腳本，10 個 test case 全過：

PASS | safe cmd             | expected=passthrough  | actual=passthrough
PASS | adb reboot           | expected=ask          | actual=ask
PASS | docker down          | expected=ask          | actual=ask
PASS | s3 rm                | expected=deny         | actual=deny
PASS | kubectl del          | expected=deny         | actual=deny
PASS | tunnel cleanup       | expected=passthrough  | actual=passthrough
PASS | execute yes          | expected=deny         | actual=deny
PASS | regular push         | expected=ask          | actual=ask
PASS | force push           | expected=deny         | actual=deny
PASS | git status           | expected=passthrough  | actual=passthrough

最有趣的驗證：測試過程中 hook 攔截了自己的測試命令。 因為 Bash 命令字串包含 --execute --yes，PreToolUse hook 認為這是一個 production DB 寫入而直接 deny。這其實是最強的 end-to-end 驗證——hook 在真實 runtime 中確實有效。

實施前後對比

Before

38 skills:
  Layer 3: 0 skills   ← 沒有任何一個用 Runtime 防護
  Layer 2: 1 skill
  Layer 1: 10 skills
  No checkpoint: 12 skills (5 破壞性)
  Read-only: 17 skills (無防護)

After

38 skills:
  PreToolUse Hook:     6 個 if filter 覆蓋所有危險 Bash pattern
  Skill 拆分:          bads-update → bads-update + bads-update-execute
  AskUserQuestion:     4 個破壞性 skills
  disallowed-tools:    17 個唯讀 skills

  所有破壞性操作至少有一層 Runtime 防護 ✓

設計原則

1. Defense in Depth，但主防線必須是 Layer 3

Layer 3 (must-have)  → PreToolUse Hook / Skill 拆分
Layer 2 (nice-to-have) → AskUserQuestion 做 UX
Layer 1 (last resort) → 自然語言提示

Layer 2 和 Layer 1 是 defense-in-depth，不是主防線。

2. `if` filter 是效能關鍵

沒有 if filter 的話，每一個 Bash 命令都會 spawn 一個 hook process。ls、git status、./gradlew build 每個都會多等 100-200ms。

// 好：只在 git push 時才 spawn hook
{"if": "Bash(git push*)", "command": "safety-hook.sh"}

// 差：每個 Bash 命令都 spawn hook
{"command": "safety-hook.sh"}

3. Skill 拆分比 Checkpoint 更可靠

在 Skill 內部加 checkpoint（不管是 Layer 1 還是 Layer 2），模型都有概率跳過。但 Skill 拆分要求用戶手動輸入另一個 slash command，這是 0% 模型能繞過的。

4. `allowed-tools` ≠ 限制

Claude Code 的 allowed-tools 是 pre-approval，不是 restriction。用了 allowed-tools: [Bash, Read] 不代表模型不能用 Edit——只是用 Edit 時會觸發權限提示。

要真的限制，用 disallowed-tools。

結語

AI Agent 安全性的核心問題不是「模型會不會聽話」，而是「如果模型不聽話，後果是什麼」。

對於唯讀操作——無所謂，最壞情況是浪費一些 context。

對於不可逆操作——git push --force、aws s3 rm、production DB 寫入——你需要的不是「拜託你先問我」，而是「你根本動不了」。

PreToolUse Hook 就是那個「你根本動不了」。

參考資料

Claude Code Hooks Guide — PreToolUse Hook 的官方文檔
Claude Code Skills — Skill frontmatter 定義（allowed-tools、disallowed-tools）
Handle approvals and user input — AskUserQuestion 的 blocking 行為
GitHub Issue #19308 — 模型忽略 Skill 中明確 tool call 指令的實例
GitHub Issue #18454 — 模型忽略 CLAUDE.md 中標記 MANDATORY 的自然語言 checkpoint

這是「Claude Code 實戰」系列。上一篇：Claude Code Skill 的安全閘門：從 35 個 Skills 的審計到三層防護模型。

用 Apple Container + Rosetta 在 Mac 上跑 AOSP Module Build

Claude Code Skill 的安全閘門：從 35 個 Skills 的審計到三層防護模型