How Chat-Driven Execution Works

The Headline Diagram

Every prompt you type travels through eight distinct stages before a response appears. The diagram below maps the full journey — from keypress to memory persistence.

The 8 Stages Explained

Stage 1 — User Input → Claude Code CLI

The Claude Code CLI receives your prompt text. Nothing else has happened yet — no governance rules are loaded, no tools are called. The CLI is the entry point that bootstraps everything that follows.

Stage 2 — SessionStart Hook (first turn only)

On the very first turn of a session, the hook script on-session-start.sh fires before Claude processes anything. It reads SESSION-AUDIT.md (the running log of what is in flight), ACTIVE-PROJECT.md (the current primary work item), and performs a memory search using the claude-memory.py CLI against the 44 per-agent SQLite databases in /home/opsadmin/.openclaw/memory/. The result is injected directly into the system prompt, so Claude starts every session already oriented to current state rather than asking you “where were we?”

Stage 3 — Request Routing via CLAUDE.md

Claude reads the full governance document at /home/opsadmin/CLAUDE.md — specifically the Request Routing Protocol section and the Tool Trigger Conditions table. This step determines: Does a matching plan already exist in ~/.claude/plans/? Which agent domain owns this work? What skill, if any, should auto-invoke? Is this execution-mode work that should be dispatched to a Sonnet subagent rather than handled by Opus directly? Routing happens before any tool is called or any code is written.

Stage 4 — Tool Selection

Claude picks the appropriate tool for the work:

Skill — loads a YAML workflow from ~/.claude/skills/<name>/SKILL.md and re-enters with that context in scope
MCP tool — dispatches to one of the 14 connected MCP servers (configured in ~/.mcp.json), which runs as a child process
Agent tool — spawns a Sonnet or Haiku subprocess with an isolated context window and a structured 8-element prompt
Bash / Read / Edit — built-in tools that execute directly in the shell or filesystem

Stage 5 — Sub-system Dispatch

Each tool type has a different execution substrate. Bash tools run shell commands directly and return stdout/stderr. MCP tools invoke a JSON-RPC call to a connected server process — for example, memory_search calls into the openclaw-tools MCP server, which runs claude-memory.py under the hood. Agent tool dispatches create a fully isolated subprocess with its own context; that subagent reads its own files, calls its own tools, and returns a structured result. Skills re-inject YAML-based instructions and constraints into the current conversation before Claude continues.

Stage 6 — LLM Call via Portkey `:18900`

Every call to an LLM — Anthropic, OpenAI, Moonshot, or OpenRouter — goes through the Portkey proxy running at port 18900 (/home/opsadmin/.openclaw/portkey-proxy/proxy.js). The proxy reads a config slug (e.g. sonnet-for-coding, kimi-for-calls) to determine which provider and model to route to, applies the correct per-agent cache namespace, and attaches metadata (agent ID, session ID, tier). The response comes back through Portkey, which non-blockingly writes a row to the tool_calls Supabase table (CHOKEPOINT-1) and sends a trace to Langfuse at :18840 for observability. If Postgres is unreachable, the row is queued to /tmp/openclaw/tool-calls-fallback.jsonl for later drain.

Stage 7 — PostToolUse Hook

After every tool call completes, on-post-tool.sh fires. It writes a JSONL audit entry to /home/opsadmin/.openclaw/logs/claude-code-audit.log and enqueues the interaction summary to the memory pipeline. The pipeline calls Voyage-4 to embed the text, then stores the embedding in the chunks_vec table (vec0 vector index) and the raw text in chunks_fts (FTS5 full-text index) of the relevant agent’s SQLite database. This is how decisions, plans, and discoveries survive across sessions.

Stage 8 — Stop Hook (session end)

When the Claude Code session ends, on-stop.sh fires. It calls session-summary-extractor.py, which reads the bridge JSON file written at /tmp/claude-session-summary-$$.json (or falls back to plan-title inference if the bridge file is absent). The summary is appended to SESSION-AUDIT.md under the SESSION HISTORY section and enqueued for memory embedding. This is what makes “where did we leave off?” answerable at the start of the next session.

Five Worked Examples

Example 1 — Reading a Log

Henry types: “show me the gateway logs from last hour”

Routing identifies this as a read-only status query — no plan check needed, no subagent needed. Claude selects the Bash tool and runs:

journalctl --user -u openclaw-gateway --since "1 hour ago" --no-pager

The PreToolUse hook validates the command against the safety blocklist in pre-bash-check.sh — it passes. The command runs, stdout comes back, Claude formats it and returns the response. PostToolUse writes the audit entry. No LLM subagent was involved, no Portkey call was made. The total cost: zero tokens beyond the display response.

Example 2 — Searching Memory

Henry types: “what did we decide about the cred rotation?”

Routing matches the memory_search MCP tool trigger. Claude calls mcp__openclaw-tools__memory_search with the query string. The MCP server invokes claude-memory.py search "cred rotation", which embeds the query using Voyage-4 and runs a hybrid search (70% vector similarity via vec0 + 30% BM25 keyword via FTS5) across the memory database. The top-K chunks are returned, formatted, and displayed. No LLM subagent. No Portkey call for the retrieval itself — Voyage-4 runs locally in the Python process.

Example 3 — Sonnet Subagent Dispatch

Henry types: “go ahead and implement the new tool_calls schema migration”

Routing identifies the keyword “go ahead” as an execution-mode trigger (per G-MODEL-ROUTING-AT-EXEC). Opus — the main conversation model — acts as dispatcher. It builds an 8-element prompt containing the task goal, evidence gathered (the migration file path, schema diff, prior test results), the tool to use (psql + the migration SQL file), acceptance criteria (grep for new column, check NOT NULL constraint), rollback command (DROP COLUMN), and rule constraints (G-NO-PLAINTEXT-CREDS, CHOKEPOINT-3). This prompt is dispatched via the Agent tool with model: sonnet.

Sonnet runs in an isolated subprocess: it reads the migration file, validates it, runs psql against the Supabase connection string, checks the output, and returns a structured result. Opus receives the result, runs its own independent verification (querying the table schema directly), and only marks the phase complete if the check passes.

Example 4 — Skill Invocation

Henry types: /dispo-blast

The Tool Trigger Conditions table in CLAUDE.md matches the /dispo-blast keyword to the dispo-blast skill. Claude invokes the Skill tool, which loads ~/.claude/skills/dispo-blast/SKILL.md — a YAML document that defines the workflow, the tools to call, the pre-blast history checks (already-blasted, recent-engagement, saturation, suppression), and the constraint that --dry-run is required unless live blast is explicitly authorized. Claude re-enters the conversation with that YAML in scope and executes the skill workflow step by step. The skill wraps existing scripts — it does not reimplement logic that already lives in dispo-blast-engine.js.

Example 5 — Hook Blocking a Dangerous Command

Henry (hypothetically) types: “rm -rf /”

Claude selects the Bash tool as the appropriate tool for this shell command. Before the command runs, the PreToolUse hook fires: pre-bash-check.sh evaluates the command against its blocklist of destructive patterns. The pattern rm -rf / matches. The hook exits with code 2, which signals the Claude Code harness to block the tool call. The Bash tool never executes. Claude receives the hook’s rejection and returns an explanation to Henry. The audit log records the blocked attempt.

Why Hooks Matter

The four hooks — SessionStart, PreToolUse, PostToolUse, and Stop — enforce properties that neither Claude nor the operator should have to think about manually.

SessionStart solves the cold-start problem. Without it, every session would begin from scratch, forcing Henry to re-explain context. With it, Claude starts each session already loaded with the current work queue, the active project, and the top memory hits from prior sessions.

PreToolUse is the last safety gate before any shell command or external call executes. It runs synchronously, blocking execution if the command matches the blocklist. This cannot be bypassed by prompt engineering — it runs outside Claude’s context entirely.

PostToolUse is what makes the memory system self-maintaining. Every meaningful interaction gets embedded and stored automatically, without Claude needing to remember to call a save function. The embedding pipeline runs asynchronously so it does not add latency to the response.

Stop closes the loop. Without it, session history would exist only in the conversation transcript, invisible to future sessions. The stop hook extracts the summary and appends it to SESSION-AUDIT.md, which SessionStart reads the next time. This is the mechanism that makes the system accumulate institutional memory over time rather than resetting every 24 hours.

Why Portkey Is a Chokepoint

Every LLM call in the system — whether from the main conversation, a Sonnet subagent, or an agent running independently — funnels through the Portkey proxy at :18900. This is intentional and governed by CHOKEPOINT-1.

Three things happen at the chokepoint that cannot happen if calls bypass it: First, a tool_calls row is written to Supabase with the model, cost, token counts, latency, and agent ID. This is the audit trail that makes cost attribution and drift detection possible. Second, a trace is sent to Langfuse at :18840, providing request-level observability across all providers and all agents. Third, provider routing and per-agent cache namespaces are applied consistently — the same Sonnet call from two different agents gets routed to the right provider and billed to the right cost center.

A call that bypasses Portkey produces no audit row, no Langfuse trace, and no cache hit. The tool-calls-health-check.timer runs every 5 minutes comparing Portkey call counts against tool_calls inserts; a greater-than-10% delta fires a Discord alert to #ops.

Where to Go Next

Chat Execution Sequence (full diagram) — the standalone sequence diagram in more detail
Hook Lifecycle Map — how all four hooks interact with the broader system
Skills Primer — what skills are, how they differ from MCP tools, and how to invoke them
Architecture Snapshot — the full service topology, port registry, and agent tier map