Hetzner Hub

Hetzner provides the primary VPS substrate for all of OpenClaw. The single production VPS (srv1347501.hstgr.cloud) hosts the gateway, 36+ agents, all webhook handlers, cron jobs, and memory DBs. This hub documents the compute tier, right-sizing history, and the Reboot #5 incident. Read before making any infrastructure changes, service deployments, or scaling decisions. KB directory does not exist — this hub is authored from CLAUDE.md, ARCHITECTURE.md, and memory entries per the Wave 1 spec.

Quick reference

Field	Value
Vendor	Hetzner (via Hostinger shared naming in DNS — VPS is Hetzner CCX-class)
URL	https://console.hetzner.cloud
Dashboard	https://console.hetzner.cloud
KB doc	SOURCE MISSING — no `workspace/knowledge-base/hetzner/` directory. Sources: CLAUDE.md + ARCHITECTURE.md
Auth method	API Token (Bearer) + web login
Auth credential	`op://Aurora/hetzner/login` (web), `op://Aurora/hetzner/api-token` (API)
Cred-proxy port	n/a
Webhook port	n/a
Webhook handler	n/a
Webhook dedup table	n/a
Tunnel path	n/a
Outbound API base	https://api.hetzner.cloud/v1
Rate limits	3,600 req/hr per token (Hetzner Cloud API)
Rate-limit action	429 → exponential backoff (3 retries), Discord ops alert
Cost	CCX43 ~ $168.84/ m o; CCX 33 r i g h t - s i ze d$ 102/mo
Backup/recovery	Hetzner snapshots (manual); no automated snapshot confirmed
Current instance	CCX43 (8 vCPU / 32 GB RAM, ~$168.84/mo)
Right-sized target	CCX33 (4 vCPU / 32 GB RAM — working set 6.5 GB, CCX33 right-sized)
VPS hostname	`srv1347501.hstgr.cloud`
Tailscale hostname	`srv1347501.tailb025a7.ts.net`
Region	Hetzner Cloud — single region (location pending drift check)
EU candidate	Hetzner Frankfurt CCX11 — €5.10/mo (BetterBets EU entity / Estonia OÜ node)
Discord alert channel	ops
Drift cadence	Daily — `tool-calls-health-check.timer` + G-FAILED-SERVICE-MTTR cron
Status	production

VPS substrate

All 36+ OpenClaw agents, the gateway (:18789), Portkey proxy (:18900), webhook handlers (:18790–:18803), and 44 SQLite memory DBs run on this single VPS. The architecture is intentionally single-node for cost efficiency; the long-term migration path is Mac mini (see Reboot #5 incident below).

Resource profile

Resource	Current (CCX43)	Working set	Right-sized (CCX33)
vCPU	8	—	4
RAM	32 GB	6.5 GB active	32 GB
Monthly cost	~$168.84	—	~$102.00

Finding: The 6.5 GB working set fits comfortably in CCX33 (32 GB RAM). CCX43 was provisioned during a growth phase and is now over-sized. Downsize recommendation = stay on VPS → Mac mini long-term path. See project_vps_reboot_5_internal_cascade_F1F4_2026-05-01 for the full incident analysis.

Key paths on VPS

Path	Purpose
`/home/opsadmin/.openclaw/`	All agent configs, scripts, workspace, memory DBs
`/home/opsadmin/.openclaw/memory/*.sqlite`	44 SQLite agent memory DBs (verified 2026-05-01)
`/home/opsadmin/.openclaw/workspace/`	Scripts, webhooks, knowledge base, plans
`/tmp/openclaw/`	Runtime logs, fallback JSONL queues
`~/.ssh/`	SSH keys including `openclaw-mac.pem` for EC2 Mac

Reboot #5 — Internal oomd cascade (F1-F4 applied 2026-05-01)

Incident date: 2026-05-01 Root cause: Internal OOM daemon (oomd) cascade — NOT external hardware failure. The VPS kernel OOM killer triggered a cascade termination of OpenClaw processes when memory pressure exceeded oomd thresholds.

Contributing factors:

CCX43 was over-provisioned but processes were not memory-limited (no cgroup limits)
Multiple agents running parallel LLM calls created spike memory pressure
oomd threshold tuning not applied before incident

F1-F4 fixes applied:

Fix	Description
F1	Set per-service `MemoryHigh=` + `MemoryMax=` cgroup limits in systemd units
F2	Tuned `oomd` thresholds to be less aggressive during LLM call spikes
F3	Added `RestartOnFailure` watchdog to critical services (gateway, portkey-proxy)
F4	Added `/tmp/openclaw/tool-calls-fallback.jsonl` drain path for CHOKEPOINT-1 when Postgres unreachable post-cascade

Post-incident recommendation: Stay on VPS (CCX33 right-sized) → eventual migration to Mac mini. The working set (6.5 GB) is well within CCX33 capacity.

Memory reference: project_vps_reboot_5_internal_cascade_F1F4_2026-05-01 · feedback_substrate_right_size_to_working_set

Service inventory

The full service inventory is not yet canonical — CLAUDE.md §Webhook Service Port Map documents 9 of ~41 live services (audit 2026-05-01). Phase 1.5 ships workspace/port-registry.md as the authoritative source.

Query live state:

systemctl --user list-units --state=active
sudo systemctl list-units --state=active | grep openclaw
pm2 list
ss -tlnp

G-FAILED-SERVICE-MTTR: any service in failed state for >24h must be (a) fixed, (b) explicitly disabled, or (c) archived per feedback_archive_not_delete. Daily cron Discord-alerts #ops on failures.

G-SERVICE-PRE-START-DOC: new units must be added to CLAUDE.md port map AND workspace/ARCHITECTURE.md BEFORE first start.

EU node candidate — Hetzner Frankfurt

As part of the BetterBets EU entity / Estonia OÜ plan (project_betterbets_eu_entity_estonia_oue), a Hetzner Frankfurt node is being evaluated:

Property	Value
Location	Hetzner Frankfurt (Germany, EU)
Instance	CCX11 (2 vCPU / 4 GB RAM)
Cost	€5.10/mo
Purpose	EU regulatory compliance node for Binance.com + prediction markets
Status	PLANNED — 5 blockers open (B1 CPA review, B2 Binance UBO policy, B3 PR Act 60, B4 Polymarket legal, B5 Binance×PR)

Cannot trade today. The EU node is a dependency for EU-regulated trading operations but is not yet provisioned. Refer to betterbets-eu-entity-estonia-oue for current blocker status.

Components

workspace/ARCHITECTURE.md — system architecture reference (has drift note: 25 days stale as of 2026-05-02)
workspace/scripts/security-audit-funnel.js — weekly audit (runs on this VPS)
systemctl --user list-units — authoritative live service inventory
~/.openclaw/tools/openclaw-vault-sync.sh — vault rsync to GitHub every 15 min
~/.openclaw/tools/openclaw-vault-pull.sh — vault pull from GitHub every 5 min
workspace/port-registry.md — PLANNED (Phase 1.5); not yet built

How it’s used

Trigger: all OpenClaw workloads run on this VPS — it IS the substrate
Flow: Internet → Cloudflare Tunnel (edge, see cloudflare) → VPS → handler ports → agent dispatch via gateway → LLM via Portkey → Discord response
Agents involved: all 36+ agents — the VPS is their execution environment
Failure mode: VPS crash → total outage. F1-F4 reduce cascade risk. Recovery: Hetzner console reboot → services restart via systemd Restart=on-failure watchdog.
Success criteria: systemctl --user status openclaw-gateway portkey-proxy both active (running); gateway logs flowing at /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log

Cross-links

Agents that touch this

_summary — primary builder, runs on this VPS
_summary — infrastructure oversight, service monitoring
_summary — central orchestration, 80%+ of cron jobs

Plans that govern this

openclaw-fragmentation-fix-2026-05-01 — Phase 1.5 ships port-registry.md
project_vps_reboot_5_internal_cascade_F1F4_2026-05-01 — Reboot #5 incident + F1-F4

Feedback rules

feedback_substrate_right_size_to_working_set — size instances to working set, not peak
feedback_service_pre_start_doc — G-SERVICE-PRE-START-DOC: document before start
feedback_failed_service_mttr — G-FAILED-SERVICE-MTTR: 24h max tolerated downtime
feedback_archive_not_delete — archive retired services, don’t delete
reference_hetzner_credentials_1p — Hetzner web login + API token in 1Password vault Aurora

KB / source docs

CLAUDE.md §Webhook Service Port Map — partial service table (9/41 services documented)
CLAUDE.md §Vault Sync Timers — vault-sync + vault-pull systemd timers
workspace/ARCHITECTURE.md — has 25-day drift note; live state via systemctl + ss + pm2
SOURCE MISSING: workspace/knowledge-base/hetzner/ — no KB dir. Authored from CLAUDE.md + ARCHITECTURE.md per Wave 1 spec.

System maps

infrastructure — full infra topology
vm-overview — VPS service map

This hub is the anchor for the Infra/compute cluster:

cloudflare — public edge, tunnel, WAF, DNS
aws — EC2 Mac Ultra (arm64), S3 buckets, IL scraping dependency
github — vault backup (traewayrer/openclaw-vault), CI/CD

Cost context: CCX43 (~ $168.84/ m o) + Cl o u df l a re P ro ($ 20/mo) + AWS Mac Ultra + misc = primary infra cost center. CCX33 right-sizing saves ~$66/mo. Full cost tracking: cost-tracking.

Hetzner credentials live in 1Password vault Aurora.

1password — credential vault anchor
Cred refs: op://Aurora/hetzner/api-token + op://Aurora/hetzner/login
Memory: reference_hetzner_credentials_1p

Open issues / TODOs

CCX43 → CCX33 downsize pending — saves ~$66/mo; requires coordinated downtime window
Phase 1.5 workspace/port-registry.md not yet built — 23 of 41 services undocumented (G-SERVICE-PRE-START-DOC technical debt)
Hetzner KB dir missing — create workspace/knowledge-base/hetzner/ + populate API.md and add to CLAUDE.md platforms list per G-KB-SYNC-WITH-CLAUDEMD
EU Frankfurt node — PLANNED, blocked on 5 BetterBets EU blockers
Mission Control Docker containers (:3000, :8000, :5432, :6380) exposed on 0.0.0.0 — should be localhost-only per ARCHITECTURE.md security note

Recent activity

2026-05-03: hub created (W1-S7 sub-agent) — sourced from CLAUDE.md + ARCHITECTURE.md (KB dir missing)
2026-05-01: Reboot #5 internal oomd cascade; F1-F4 fixes applied; CCX43 right-size analysis complete
2026-05-01: Phase D fragmentation-fix audit; 37 live services found vs 9 documented

Quartz 4

Explorer

Hetzner Hub

Hetzner Hub

Quick reference

VPS substrate

Resource profile

Key paths on VPS

Reboot #5 — Internal oomd cascade (F1-F4 applied 2026-05-01)

Service inventory

EU node candidate — Hetzner Frankfurt

Components

How it’s used

Cross-links

Agents that touch this

Plans that govern this

Feedback rules

KB / source docs

System maps

Open issues / TODOs

Recent activity

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

Hetzner Hub

Hetzner Hub

Quick reference

VPS substrate

Resource profile

Key paths on VPS

Reboot #5 — Internal oomd cascade (F1-F4 applied 2026-05-01)

Service inventory

EU node candidate — Hetzner Frankfurt

Components

How it’s used

Cross-links

Agents that touch this

Plans that govern this

Feedback rules

KB / source docs

System maps

Related: Infra/compute cluster

Related: Credential layer cluster

Open issues / TODOs

Recent activity

Graph View

Table of Contents

Backlinks