LLM Cost Flow

This map traces how every LLM call generates cost, gets recorded, and triggers alerts. Read this when debugging cost attribution discrepancies, investigating runaway spend, or understanding why cost-monitor fires. The Portkey proxy at :18900 is the single chokepoint — all LLM traffic must pass through it, and every call must produce a tool_calls row in Supabase before returning (CHOKEPOINT-1).

Diagram

graph LR
    A[Henry / Agent] -->|prompt| B[Portkey :18900]
    B -->|tier-routed + virtual-key| C{Provider Router}
    C -->|Opus / Sonnet / Haiku| D[Anthropic Max Plan]
    C -->|fallback route| E[OpenRouter]
    C -->|specialty models| F[Moonshot Kimi]
    B -->|pre-return insert| G[(tool_calls table)]
    G -->|required fields: agent_id, tool_name, latency_ms| G
    G -->|fallback if DB down| H[/tmp fallback JSONL]
    H -->|drain within 1h| G
    G -->|health-check every 5 min| I[tool-calls-health-check.js]
    I -->|delta Portkey vs DB > 10%| J[Discord #ops alert]
    G -->|nightly aggregate| K[friction report YYYY-MM-DD.md]
    D -->|cache_control header| L{Cache Hit?}
    L -->|hit| M[Reduced token cost]
    L -->|miss| N[Full token cost]

How to read this

  • Portkey :18900 is the mandatory routing layer — direct calls to Anthropic bypass cost attribution entirely and violate CHOKEPOINT-1.
  • tool_calls table in Supabase (project svueekfvfrvhylxygktb) must receive a row for every LLM call. Nullable fields (cost_usd, tokens_in/out) indicate Max-plan flat-rate calls; non-null values indicate paid API-tier calls.
  • /tmp fallback JSONL absorbs writes when Supabase is unreachable — the health-check script drains this queue back to Postgres within 1 hour or escalates.
  • Discord ops alert fires when the 5-minute cron detects >10% delta between Portkey call count and tool_calls insert count — this is the primary signal for billing bypass.
  • Cache hits (via cache_control on Opus/Sonnet) reduce per-call cost significantly; the Max plan provides flat-rate coverage but cache health still affects throughput.

See also