LLM Cost Flow
This map traces how every LLM call generates cost, gets recorded, and triggers alerts. Read this when debugging cost attribution discrepancies, investigating runaway spend, or understanding why cost-monitor fires. The Portkey proxy at :18900 is the single chokepoint — all LLM traffic must pass through it, and every call must produce a tool_calls row in Supabase before returning (CHOKEPOINT-1).
Diagram
graph LR A[Henry / Agent] -->|prompt| B[Portkey :18900] B -->|tier-routed + virtual-key| C{Provider Router} C -->|Opus / Sonnet / Haiku| D[Anthropic Max Plan] C -->|fallback route| E[OpenRouter] C -->|specialty models| F[Moonshot Kimi] B -->|pre-return insert| G[(tool_calls table)] G -->|required fields: agent_id, tool_name, latency_ms| G G -->|fallback if DB down| H[/tmp fallback JSONL] H -->|drain within 1h| G G -->|health-check every 5 min| I[tool-calls-health-check.js] I -->|delta Portkey vs DB > 10%| J[Discord #ops alert] G -->|nightly aggregate| K[friction report YYYY-MM-DD.md] D -->|cache_control header| L{Cache Hit?} L -->|hit| M[Reduced token cost] L -->|miss| N[Full token cost]
How to read this
- Portkey :18900 is the mandatory routing layer — direct calls to Anthropic bypass cost attribution entirely and violate CHOKEPOINT-1.
- tool_calls table in Supabase (project
svueekfvfrvhylxygktb) must receive a row for every LLM call. Nullable fields (cost_usd,tokens_in/out) indicate Max-plan flat-rate calls; non-null values indicate paid API-tier calls. /tmpfallback JSONL absorbs writes when Supabase is unreachable — the health-check script drains this queue back to Postgres within 1 hour or escalates.- Discord ops alert fires when the 5-minute cron detects >10% delta between Portkey call count and
tool_callsinsert count — this is the primary signal for billing bypass. - Cache hits (via
cache_controlon Opus/Sonnet) reduce per-call cost significantly; the Max plan provides flat-rate coverage but cache health still affects throughput.
Related
- ports-topology — shows Portkey at
:18900and Anthropic routing in topology context - agents-tier-structure — shows which agents run on Opus vs Sonnet vs Haiku (cost tiers)
- incident-response-flow — covers what happens when cost-monitor fires the Discord alert
- auth-chain-map — upstream auth chain before any LLM call is dispatched
See also
- CLAUDE.md — POSTGRES-CHOKEPOINT-1 schema (required fields, nullable fields, fallback rules)
- tool-calls-health-check.js — 5-min cron that enforces the >10% delta gate
- 2026-05-02-002-tool-calls-not-null.sql — migration that added NOT NULL enforcement on required fields