Cost Tracking Hub

This is the aggregating hub for all LLM and vendor cost instrumentation across OpenClaw. It references every integration that incurs cost, documents the CHOKEPOINT-1 tool_calls table, the cost-monitor cron, Discord #ops alert routing, and per-vendor cost rates. Read this hub when diagnosing cost overruns, auditing LLM spend, verifying CHOKEPOINT-1 compliance, or adding a new cost-incurring integration. It does not own the integrations — it aggregates their cost signals.

CHOKEPOINT-1 — tool_calls (Supabase CCP)

Per CLAUDE.md POSTGRES-CHOKEPOINT Phase 1.4: Every LLM call MUST write a tool_calls row before returning. This is the single cost audit surface. Any bypass = governance violation.

Schema

ColumnTypeRequiredNotes
iduuidYPK
agent_idtextY (NOT NULL)Agent making the call
tool_nametextY (NOT NULL)Tool or model identifier
parametersjsonbYMUST include tier + model keys (CHECK constraint)
result_okboolY (NOT NULL)success/failure
latency_msintY (NOT NULL)wall-clock latency
called_attimestamptzY (NOT NULL)call timestamp
cost_usdnumericnullableNULL = Max plan flat-rate; >0 = API-tier paid
tokens_inintnullableNULL = Max plan
tokens_outintnullableNULL = Max plan
cache_read_tokensintnullableprompt cache reads
cache_write_tokensintnullableprompt cache writes

NULL cost_usd is intentional — signals flat-rate Max plan call (not an error).

Migration files:

  • workspace/migrations/2026-05-02-001-infra-config-changes.sql
  • workspace/migrations/2026-05-02-002-tool-calls-not-null.sql

Health check

tool-calls-health-check.timer (every 5 min) compares Portkey call count vs tool_calls insert count. If delta >10% → Discord #ops alert.

node /home/opsadmin/.openclaw/workspace/scripts/tool-calls-health-check.js

Fallback on Postgres outage

If Supabase CCP unreachable → write to /tmp/openclaw/tool-calls-fallback.jsonl. Drain back within 1h or escalate.


LLM cost tiers

Primary — Anthropic Max Plan (flat-rate)

Two Max plan seats active:

SeatEmailProxy portAgents served
PrimaryhenryRERI:18900Main interactive session
Secondaryteamsteph@betterfiles.com:18903 (anthropic-max-router)40 agents

Cost: Flat-rate monthly subscription (no per-token charges). Signal in tool_calls: cost_usd = NULL, tokens_in = NULL, tokens_out = NULL. Hub: anthropic

Routing layer — Portkey Proxy

Portkey (127.0.0.1:18900 primary, 127.0.0.1:18903 secondary) is the cost chokepoint — all LLM calls flow through it.

  • portkey-proxy/proxy.js — proxy implementation
  • Fix #1 (Kimi 404): per-model gating prevents 404 cascades
  • Fix #16 (cache_control): max-plan path respects cache_control properly
  • CHOKEPOINT-1 enforcement: tool-calls-health-check.timer every 5 min
  • Hub: portkey

Fallback — OpenRouter

Used when Anthropic is unreachable or for budget routing. Cost: API-tier per-token; rates vary by model routed. Hub: openrouter (Wave 2)


Per-vendor cost rates

VendorCost modelTypical monthlyNotes
Anthropic (Max plan)Flat-rate subscription$200/seat × 2NULL in tool_calls
OpenRouter fallback~$0.002-0.015 / 1K tokensVariableLogged with cost_usd >0
Voyage-4 embeddings~$0.06 / 1M tokens~$5-15Via supabase vec0
Apollo (enrichment)Credit-based; ~$0.01/contact~$10-50Via il-marketplace-pull
Hunter.io (email enrichment)500 free/mo; $49/mo paid~$49Via IL enrichment pipeline
SalesMsg SMS~$0.01/msg~$50-200Via salesmsg
OpenPhone SMSIncluded in planFlatVia openphone-quo
Hetzner VPSCCX33 ~€20/mo~$22hetzner
Supabase CCPPro plan; usage-based above free tier~$25+supabase
GitHub ActionsFree tier (private repo)~$0github

All costs reported to Discord #ops. See alert routing below.


cost-monitor cron

cost-monitor.timer runs nightly to aggregate tool_calls spend and compare against thresholds.

AlertThresholdChannel
LLM overage>10% delta Portkey vs tool_callsDiscord #ops
Daily spend spikecost_usd sum > $X in 24hDiscord #ops
Voyage embedding burnembedding_cache miss rate >50%Discord #ops
Fallback queue stale/tmp/openclaw/tool-calls-fallback.jsonl >1h oldDiscord #ops

Note: The main agent owns ~80% of cron and is a bottleneck. Phase 11.2 (OSIL) plans to distribute cron ownership across specialist agents. This is flagged as an architectural debt item.


Main agent cron bottleneck

Drift finding (2026-05-01 Visual Mapping): The main agent owns ~80% of all cron timers (62 total). This is a bottleneck and single point of failure. OSIL Phase 2 will distribute cost-monitor, tool-calls-health-check, and friction-report timers to purpose-built agents.

Relevant plan: openclaw-self-improvement-layer-2026-05-03 — OSIL Phase 2 cron redistribution.


CHOKEPOINT-2 — infra_config_changes

Config changes (Portkey routing edits, model routing changes, agent route changes) MUST write an infra_config_changes row BEFORE going live.

Schema: change_type, target_system, after_state, reason, approved_by, applied_via (+ before_state, rollback_command, smoke_test_result).

Migration: workspace/migrations/2026-05-02-001-infra-config-changes.sql


Supabase CCP — cost-relevant tables

TablePurposeCost signal
tool_callsLLM call audit trailPRIMARY cost surface (CHOKEPOINT-1)
infra_config_changesConfig change auditConfig-change gate (CHOKEPOINT-2)
openphone_eventsSMS send auditSMS cost tracking
ib_blast_historyBlast dedup + historyIB blast cost signal
processed_webhook_eventsWebhook dedup24h TTL
webhook_audit_logSecurity auditnon-blocking write

Supabase project: CCP (svueekfvfrvhylxygktb) — PRODUCTION. All cost writes go here.

Hub: supabase


Discord ops alert routing

All cost alerts route to Discord #ops channel. Sources:

Alert sourceTimerCondition
tool-calls-health-checkevery 5 minPortkey vs tool_calls delta >10%
cost-monitornightlySpend thresholds
security-audit-funnelMonday 06:00 PTSig failure >5%, zero-event endpoints
weekly-index-auditweeklyGovernance logs stale >14d
friction-reportnightly 02:00 PTP0/P1/P2 friction backlog

Hub: discord


Adding a new cost-incurring integration

When a new vendor is added that incurs per-call or per-unit cost:

  1. Add vendor row to the cost rates table above
  2. Ensure every call writes a tool_calls row (CHOKEPOINT-1)
  3. Add cost threshold to cost-monitor.timer config
  4. Update the relevant integration hub with cost field
  5. Register alert in Discord #ops via discord
  6. Write infra_config_changes row (CHOKEPOINT-2)

Components

  • workspace/scripts/tool-calls-health-check.js — Portkey vs tool_calls drift check (every 5 min)
  • portkey-proxy/proxy.js — LLM routing proxy (ports 18900, 18903)
  • workspace/migrations/2026-05-02-001-infra-config-changes.sql — CHOKEPOINT-2 schema
  • workspace/migrations/2026-05-02-002-tool-calls-not-null.sql — CHOKEPOINT-1 NOT NULL constraints
  • workspace/scripts/cost-monitor.js — nightly cost aggregation (if exists; flag if missing)
  • /tmp/openclaw/tool-calls-fallback.jsonl — fallback queue on Postgres outage

Integrations aggregated by this hub

  • anthropic — LLM primary (Max plan, flat-rate)
  • portkey — LLM proxy; CHOKEPOINT-1 enforcement; dual ports
  • supabasetool_calls + infra_config_changes tables
  • discordops cost alert channel
  • salesmsg — SMS cost (~$0.01/msg)
  • openphone-quo — SMS/calls (flat plan)
  • hetzner — VPS hosting cost
  • 1password — credential layer (cost: subscription)
  • github — vault + CI/CD (free tier)
  • aws — EC2 Mac Ultra (on-demand; paused 2026-05-02)
  • investorlift — IL scraping (no direct API cost; Mac Ultra dependency)

Wave 2 integrations (pending hubs)

  • openrouter — OpenRouter fallback (per-token, Wave 2)
  • voyage — Voyage-4 embeddings (~$0.06/1M tokens, Wave 2)

Plans that govern this

Feedback rules

KB / source docs

  • API — token pricing
  • API — proxy routing docs
  • API — tool_calls write pattern

System maps

  • portkey — proxy layer (CHOKEPOINT anchor)
  • anthropic — primary LLM provider
  • supabase — data substrate (tool_calls table)
  • discord — ops alert channel

Open issues / TODOs

  • cost-monitor.js existence needs verification — confirm it exists in workspace/scripts/ or flag as missing
  • Main agent owns ~80% of cron (bottleneck) — OSIL Phase 2 will redistribute
  • PropStream daily quota not tracked in cost monitoring — add threshold
  • Apollo + Hunter costs need baseline month measurement (no historical in tool_calls)
  • Voyage embedding cache hit rate not tracked — add to health check

Recent activity

  • 2026-05-03: hub created by W1-S9
  • 2026-05-03: CHOKEPOINT-1/2, all integration cost rates, alert routing, and open issues documented