Hetzner Hub

Hetzner provides the primary VPS substrate for all of OpenClaw. The single production VPS (srv1347501.hstgr.cloud) hosts the gateway, 36+ agents, all webhook handlers, cron jobs, and memory DBs. This hub documents the compute tier, right-sizing history, and the Reboot #5 incident. Read before making any infrastructure changes, service deployments, or scaling decisions. KB directory does not exist — this hub is authored from CLAUDE.md, ARCHITECTURE.md, and memory entries per the Wave 1 spec.

Quick reference

FieldValue
VendorHetzner (via Hostinger shared naming in DNS — VPS is Hetzner CCX-class)
URLhttps://console.hetzner.cloud
Dashboardhttps://console.hetzner.cloud
KB docSOURCE MISSING — no workspace/knowledge-base/hetzner/ directory. Sources: CLAUDE.md + ARCHITECTURE.md
Auth methodAPI Token (Bearer) + web login
Auth credentialop://Aurora/hetzner/login (web), op://Aurora/hetzner/api-token (API)
Cred-proxy portn/a
Webhook portn/a
Webhook handlern/a
Webhook dedup tablen/a
Tunnel pathn/a
Outbound API basehttps://api.hetzner.cloud/v1
Rate limits3,600 req/hr per token (Hetzner Cloud API)
Rate-limit action429 → exponential backoff (3 retries), Discord ops alert
CostCCX43 ~102/mo
Backup/recoveryHetzner snapshots (manual); no automated snapshot confirmed
Current instanceCCX43 (8 vCPU / 32 GB RAM, ~$168.84/mo)
Right-sized targetCCX33 (4 vCPU / 32 GB RAM — working set 6.5 GB, CCX33 right-sized)
VPS hostnamesrv1347501.hstgr.cloud
Tailscale hostnamesrv1347501.tailb025a7.ts.net
RegionHetzner Cloud — single region (location pending drift check)
EU candidateHetzner Frankfurt CCX11 — €5.10/mo (BetterBets EU entity / Estonia OÜ node)
Discord alert channelops
Drift cadenceDaily — tool-calls-health-check.timer + G-FAILED-SERVICE-MTTR cron
Statusproduction

VPS substrate

All 36+ OpenClaw agents, the gateway (:18789), Portkey proxy (:18900), webhook handlers (:18790:18803), and 44 SQLite memory DBs run on this single VPS. The architecture is intentionally single-node for cost efficiency; the long-term migration path is Mac mini (see Reboot #5 incident below).

Resource profile

ResourceCurrent (CCX43)Working setRight-sized (CCX33)
vCPU84
RAM32 GB6.5 GB active32 GB
Monthly cost~$168.84~$102.00

Finding: The 6.5 GB working set fits comfortably in CCX33 (32 GB RAM). CCX43 was provisioned during a growth phase and is now over-sized. Downsize recommendation = stay on VPS → Mac mini long-term path. See project_vps_reboot_5_internal_cascade_F1F4_2026-05-01 for the full incident analysis.

Key paths on VPS

PathPurpose
/home/opsadmin/.openclaw/All agent configs, scripts, workspace, memory DBs
/home/opsadmin/.openclaw/memory/*.sqlite44 SQLite agent memory DBs (verified 2026-05-01)
/home/opsadmin/.openclaw/workspace/Scripts, webhooks, knowledge base, plans
/tmp/openclaw/Runtime logs, fallback JSONL queues
~/.ssh/SSH keys including openclaw-mac.pem for EC2 Mac

Reboot #5 — Internal oomd cascade (F1-F4 applied 2026-05-01)

Incident date: 2026-05-01 Root cause: Internal OOM daemon (oomd) cascade — NOT external hardware failure. The VPS kernel OOM killer triggered a cascade termination of OpenClaw processes when memory pressure exceeded oomd thresholds.

Contributing factors:

  • CCX43 was over-provisioned but processes were not memory-limited (no cgroup limits)
  • Multiple agents running parallel LLM calls created spike memory pressure
  • oomd threshold tuning not applied before incident

F1-F4 fixes applied:

FixDescription
F1Set per-service MemoryHigh= + MemoryMax= cgroup limits in systemd units
F2Tuned oomd thresholds to be less aggressive during LLM call spikes
F3Added RestartOnFailure watchdog to critical services (gateway, portkey-proxy)
F4Added /tmp/openclaw/tool-calls-fallback.jsonl drain path for CHOKEPOINT-1 when Postgres unreachable post-cascade

Post-incident recommendation: Stay on VPS (CCX33 right-sized) → eventual migration to Mac mini. The working set (6.5 GB) is well within CCX33 capacity.

Memory reference: project_vps_reboot_5_internal_cascade_F1F4_2026-05-01 · feedback_substrate_right_size_to_working_set


Service inventory

The full service inventory is not yet canonical — CLAUDE.md §Webhook Service Port Map documents 9 of ~41 live services (audit 2026-05-01). Phase 1.5 ships workspace/port-registry.md as the authoritative source.

Query live state:

systemctl --user list-units --state=active
sudo systemctl list-units --state=active | grep openclaw
pm2 list
ss -tlnp

G-FAILED-SERVICE-MTTR: any service in failed state for >24h must be (a) fixed, (b) explicitly disabled, or (c) archived per feedback_archive_not_delete. Daily cron Discord-alerts #ops on failures.

G-SERVICE-PRE-START-DOC: new units must be added to CLAUDE.md port map AND workspace/ARCHITECTURE.md BEFORE first start.


EU node candidate — Hetzner Frankfurt

As part of the BetterBets EU entity / Estonia OÜ plan (project_betterbets_eu_entity_estonia_oue), a Hetzner Frankfurt node is being evaluated:

PropertyValue
LocationHetzner Frankfurt (Germany, EU)
InstanceCCX11 (2 vCPU / 4 GB RAM)
Cost€5.10/mo
PurposeEU regulatory compliance node for Binance.com + prediction markets
StatusPLANNED — 5 blockers open (B1 CPA review, B2 Binance UBO policy, B3 PR Act 60, B4 Polymarket legal, B5 Binance×PR)

Cannot trade today. The EU node is a dependency for EU-regulated trading operations but is not yet provisioned. Refer to betterbets-eu-entity-estonia-oue for current blocker status.


Components

  • workspace/ARCHITECTURE.md — system architecture reference (has drift note: 25 days stale as of 2026-05-02)
  • workspace/scripts/security-audit-funnel.js — weekly audit (runs on this VPS)
  • systemctl --user list-units — authoritative live service inventory
  • ~/.openclaw/tools/openclaw-vault-sync.sh — vault rsync to GitHub every 15 min
  • ~/.openclaw/tools/openclaw-vault-pull.sh — vault pull from GitHub every 5 min
  • workspace/port-registry.md — PLANNED (Phase 1.5); not yet built

How it’s used

  • Trigger: all OpenClaw workloads run on this VPS — it IS the substrate
  • Flow: Internet → Cloudflare Tunnel (edge, see cloudflare) → VPS → handler ports → agent dispatch via gateway → LLM via Portkey → Discord response
  • Agents involved: all 36+ agents — the VPS is their execution environment
  • Failure mode: VPS crash → total outage. F1-F4 reduce cascade risk. Recovery: Hetzner console reboot → services restart via systemd Restart=on-failure watchdog.
  • Success criteria: systemctl --user status openclaw-gateway portkey-proxy both active (running); gateway logs flowing at /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log

Agents that touch this

  • _summary — primary builder, runs on this VPS
  • _summary — infrastructure oversight, service monitoring
  • _summary — central orchestration, 80%+ of cron jobs

Plans that govern this

Feedback rules

KB / source docs

  • CLAUDE.md §Webhook Service Port Map — partial service table (9/41 services documented)
  • CLAUDE.md §Vault Sync Timers — vault-sync + vault-pull systemd timers
  • workspace/ARCHITECTURE.md — has 25-day drift note; live state via systemctl + ss + pm2
  • SOURCE MISSING: workspace/knowledge-base/hetzner/ — no KB dir. Authored from CLAUDE.md + ARCHITECTURE.md per Wave 1 spec.

System maps

This hub is the anchor for the Infra/compute cluster:

  • cloudflare — public edge, tunnel, WAF, DNS
  • aws — EC2 Mac Ultra (arm64), S3 buckets, IL scraping dependency
  • github — vault backup (traewayrer/openclaw-vault), CI/CD

Cost context: CCX43 (~20/mo) + AWS Mac Ultra + misc = primary infra cost center. CCX33 right-sizing saves ~$66/mo. Full cost tracking: cost-tracking.

Hetzner credentials live in 1Password vault Aurora.

Open issues / TODOs

  • CCX43 → CCX33 downsize pending — saves ~$66/mo; requires coordinated downtime window
  • Phase 1.5 workspace/port-registry.md not yet built — 23 of 41 services undocumented (G-SERVICE-PRE-START-DOC technical debt)
  • Hetzner KB dir missing — create workspace/knowledge-base/hetzner/ + populate API.md and add to CLAUDE.md platforms list per G-KB-SYNC-WITH-CLAUDEMD
  • EU Frankfurt node — PLANNED, blocked on 5 BetterBets EU blockers
  • Mission Control Docker containers (:3000, :8000, :5432, :6380) exposed on 0.0.0.0 — should be localhost-only per ARCHITECTURE.md security note

Recent activity

  • 2026-05-03: hub created (W1-S7 sub-agent) — sourced from CLAUDE.md + ARCHITECTURE.md (KB dir missing)
  • 2026-05-01: Reboot #5 internal oomd cascade; F1-F4 fixes applied; CCX43 right-size analysis complete
  • 2026-05-01: Phase D fragmentation-fix audit; 37 live services found vs 9 documented