Architecture

The Permission Boundary: Human-in-the-Loop at Scale

An AI with shell access and no guardrails will eventually destroy something you care about. This post dissects how a production harness implements layered permissions, hooks, dangerous pattern detection, and trust boundaries — balancing safety with usability.

Tin Dang avatar
Tin Dang
Hand-drawn architecture diagram of an AI harness with the Permission Boundary layer highlighted

Here is a scenario that has happened to every developer who has given an AI agent too much freedom: the model decides to “clean up” by running a destructive command, force-pushing over a colleague’s work, or deleting files it deemed unnecessary. The model was not malicious. It was optimizing for the goal you gave it, with no understanding of the blast radius.

The permission boundary exists to prevent this. Not by limiting the model’s intelligence, but by encoding human judgment about what is safe, what is risky, and what requires explicit approval.

The Design Tension

A permission system that asks about everything is unusable. You will click “approve” reflexively until the one time you should have clicked “deny.” A permission system that asks about nothing is dangerous. The sweet spot is a system that understands the difference between reversible and irreversible actions and adjusts its behavior accordingly.

The production system resolves this with three layers: permission modes, permission rules, and hooks.

Permission Modes

The outermost control is the permission mode — a global setting that determines the system’s default behavior:

ModeBehavior
defaultPrompt for dangerous operations, allow safe ones
acceptEditsAuto-accept file edits, prompt for everything else
planExecute only pre-approved plan steps
dontAskDeny (not prompt) for anything not whitelisted
bypassPermissionsSkip all checks (testing only)
autoAI classifier auto-approves safe operations

Most users run in default mode. But the mode system enables a critical workflow: escalation and de-escalation over time. A new user starts in default, builds trust, and might move to acceptEdits. An unattended agent might run in auto with classifier-based approval. A paranoid deployment uses dontAsk with explicit allow-lists.

The mode is the macro control. Rules are the micro control.

Permission Rules

Rules are specific patterns that override the mode’s default behavior. Each rule has three components:

type PermissionRule = {
source: 'userSettings' | 'projectSettings' | 'localSettings' |
'policySettings' | 'cliArg' | 'session'
ruleBehavior: 'allow' | 'deny' | 'ask'
ruleValue: {
toolName: string // Which tool
ruleContent?: string // Optional pattern (e.g., "git *")
}
}

The source is critical. Rules from different sources have different trust levels and persistence:

  • Policy settings (enterprise admin) — Cannot be overridden by users. If your admin says “deny Bash(rm -rf *)”, that rule is absolute.
  • Project settings (.claude/settings.json) — Shared with the team via version control. “Always allow pnpm test.”
  • User settings (~/.claude/settings.json) — Personal preferences. “Always allow file reads.”
  • Local settings — Machine-specific, not committed.
  • Session — Temporary, expires when the conversation ends.
  • CLI arguments — One-time overrides for a specific invocation.

Rule evaluation order

When a tool requests permission, the system evaluates rules in precedence order:

  1. Check if allowManagedPermissionRulesOnly is set (enterprise lockdown) — if yes, only policy rules apply
  2. Otherwise, merge rules from all enabled sources
  3. Deny rules take precedence over allow rules when conflicting
  4. First matching rule wins

This means an enterprise admin can lock down the system completely, while still allowing project-level convenience rules within those bounds. A user cannot escalate past their admin’s constraints.

The Decision Flow

Every tool call passes through a decision pipeline:

Tool requests execution
Check forced decisions (pre-determined for this tool use)
Compute permission decision from rules + mode
├─ ALLOW → log decision, execute tool
├─ DENY → log decision, notify user, skip tool
└─ ASK → escalation chain:
├─ Permission hooks (user-defined automation)
├─ Coordinator mode (delegate to parent agent)
├─ Speculative classifier (bash safety analysis)
└─ Interactive user prompt (final fallback)

Each decision carries a reason — not just “allowed” or “denied,” but why:

type PermissionDecisionReason =
| { type: 'rule', rule: PermissionRule } // Matched a rule
| { type: 'hook', hookName: string } // Hook decided
| { type: 'classifier', classifier: string } // AI classified as safe
| { type: 'mode', mode: PermissionMode } // Mode default
| { type: 'workingDir', reason: string } // Outside allowed directory
| { type: 'sandboxOverride' } // Sandbox environment

This traceability matters. When something goes wrong, you can answer: “Why was this allowed?” and “Which rule or hook made that decision?”

Hooks: Programmable Permission Logic

Rules handle static patterns. Hooks handle dynamic logic. A hook is a user-defined shell command, API call, or prompt that executes at specific lifecycle points:

{
"hooks": {
"PreToolUse": [
{
"type": "command",
"command": "bash scripts/check-protected-files.sh"
}
],
"PostToolUse": [
{
"type": "command",
"command": "bash scripts/lint-changed-files.sh"
}
]
}
}

Hooks can return structured decisions:

{
"decision": "block",
"reason": "File is in protected directory",
"permissionDecisionReason": "Company policy prohibits AI edits to /config/"
}

Hook types

  • Command hooks — Shell scripts that receive tool context as JSON on stdin
  • Prompt hooks — Claude API calls that evaluate the operation
  • HTTP hooks — Remote webhook calls for centralized policy evaluation
  • Function hooks — JavaScript callbacks (SDK only)

What hooks enable

Hooks turn the permission system from a static allow/deny list into a programmable policy engine:

  • Run a linter after every file edit
  • Block edits to protected directories
  • Require a second model’s approval for destructive operations
  • Log all tool uses to an audit system
  • Inject additional context (“warning: this file was modified 5 minutes ago by another developer”)

The hook output can include additionalContexts — messages injected into the conversation. This means a hook can not only block an operation but explain why in the model’s context, so the model adapts its approach rather than blindly retrying.

Dangerous Pattern Detection

The permission system’s most sophisticated component is its analysis of bash commands. A naive approach would pattern-match against strings: block anything containing rm, sudo, or curl. But developers legitimately run rm (deleting build artifacts), sudo (installing system packages), and curl (testing APIs).

The production system uses AST-level analysis:

// Simplified from bashSecurity.ts
function analyzeBashCommand(command: string): SecurityAnalysis {
const ast = parseShellAST(command)
return {
hasSubshells: detectSubshells(ast),
hasRedirects: analyzeRedirects(ast),
hasHeredocs: detectHeredocPatterns(ast),
hasCompoundCommands: detectCommandGroups(ast),
envVarsSafe: inspectEnvironmentVariables(ast),
executesCode: matchesCodeExecutors(ast),
isDestructive: matchesDestructivePatterns(ast),
}
}

Dangerous command categories

The system maintains two levels of dangerous patterns:

Cross-platform code executors (always flagged in auto mode):

  • Interpreters: python, node, deno, ruby, perl, php
  • Package runners: npx, bunx, npm run, pnpm run
  • Shells: bash, sh, zsh (as subcommands)
  • Meta-executors: eval, exec, env, xargs, sudo

Network and mutation operations (flagged for internal environments):

  • Cloud CLIs: kubectl, aws, gcloud
  • Network tools: curl, wget, ssh
  • Git operations: git push, git reset --hard

The key insight: the same command can be safe or dangerous depending on arguments. git status is safe. git push --force is dangerous. The AST analysis distinguishes between them by inspecting the full command structure, not just the binary name.

Workspace Trust

Before any of this matters, the system must establish workspace trust. When you open a project for the first time, the harness asks: “Do you trust this workspace?”

This is not theater. Project-level settings (.claude/settings.json) can define permission rules, hooks, and skill configurations. A malicious repository could include settings that auto-approve destructive operations or inject hooks that exfiltrate data.

The trust dialog serves as a blast radius boundary:

  • Until trust is established, no hooks execute
  • Project settings are loaded but not applied
  • The user explicitly acknowledges: “I have reviewed this project’s AI configuration”

For non-interactive modes (SDK, background agents), trust is implicit — the operator is responsible for vetting the environment.

Enterprise Controls

The top of the trust hierarchy is enterprise policy. When allowManagedPermissionRulesOnly is set:

  • All user-defined rules are ignored
  • Only managed (policy) rules apply
  • Users cannot add allow rules that bypass policy
  • Hooks from non-managed sources are disabled

This creates a zero-trust enterprise layer: the admin defines what the AI can and cannot do, and individual users operate within those bounds. The implementation is simple but critical — it is a single boolean that changes the rule evaluation from “merge all sources” to “policy source only.”

Decision Logging and Auditability

Every permission decision fires a telemetry event:

  • tengu_tool_use_granted_in_config — matched an allow rule
  • tengu_tool_use_granted_by_classifier — AI classified as safe
  • tengu_tool_use_granted_in_prompt_permanent — user approved + saved rule
  • tengu_tool_use_granted_in_prompt_temporary — user approved, one-time
  • tengu_tool_use_rejected_in_prompt — user denied
  • tengu_tool_use_denied_in_config — matched a deny rule

These events feed into analytics, enabling questions like: “How often do users override default permissions?” “Which tools generate the most permission prompts?” “Are our default rules too restrictive or too permissive?”

For code-editing tools specifically, the system tracks additional metrics by programming language — enabling analysis of which languages generate more permission friction.

The Permission UX

The technical system is elegant. But the user experience is what determines whether people actually use it correctly.

When a tool triggers an ask decision, the user sees:

  1. What the tool wants to do (rendered by the tool’s renderToolUseMessage)
  2. Why it is asking (the permission reason)
  3. Options: Allow once, Allow always (save rule), Deny

“Allow always” is the lever that makes the system usable over time. Each approval can optionally become a permanent rule, reducing future friction. The system gets smarter as you use it — not because the AI learns, but because the permission rules accumulate your decisions.

The danger is “allow always” fatigue — users creating broad rules to avoid prompts, then forgetting they exist. The settings file is human-readable and version-controlled, providing a review mechanism. And enterprise policy can override user rules, creating a safety net below the user’s judgment.

Design Principles

After studying this system, three principles stand out:

  1. Classify by reversibility, not by danger. Reading a file is always safe — not because files are harmless, but because reading cannot cause damage. Writing a file is medium-risk — you can undo it. Pushing to a remote is high-risk — you cannot unpush. The classification follows blast radius, not intent.

  2. Make the common path frictionless. The 80% of tool calls that are obviously safe (file reads, search, test execution) should never prompt. The 15% that need a quick glance (file edits, new file creation) should be one-key approval. Only the 5% that are genuinely dangerous (network operations, destructive commands) should require careful review.

  3. Every decision should be traceable. When something goes wrong — and it will — you need to answer “how did this happen?” The decision reason, the rule source, the hook output, and the telemetry event provide the complete chain of accountability.

Next: Context Engineering — Building the Model’s World, where we examine how the harness constructs the model’s working memory from project files, persistent memories, and dynamic context.

0

Next in this series

Context Engineering: Building the Model's World

Continue reading