state-active-deals-pipeline

⚡ Execution-mode model routing (G-MODEL-ROUTING-AT-EXEC)

This skill is execution-mode. Per CLAUDE.md “MANDATORY: Execution-Mode Model Routing” + Henry directive 2026-05-05 (“always have a popup question box asking what model to use, with recommendations + why, not just Opus/Sonnet defaults”):

Step 1 of every invocation: Invoke the /model-selector skill BEFORE dispatching any subagent. It pops the AskUserQuestion box with candidate models (Gemma 26B / Gemma 31B / Haiku / Sonnet / Kimi / Opus) ranked by fit for the stage, with cost / latency / “why” per option. Henry picks; that model is then used for the dispatch.

Per stage recommendations the popup should default to (Henry overrides):

Stage 1 (IL pull via aws-mac) — Haiku (mechanical SSH+curl, no reasoning)
Stage 2 (audit) — Sonnet (need to interpret column meanings, sample data)
Stage 3 (sync) — Haiku (mechanical script run)
Stage 4 (Gemma enrich) — Gemma 26B for the per-deal extraction (already wired in enrich-il-deals-detail-api.js --use-gemma); Sonnet for the orchestration shell
Stage 5 (Google geocode) — Haiku (mechanical API loop)
Stage 6 (photo pull) — Haiku for orchestration; the IL HTML extraction is Mac-side
Stage 7 (vision analysis) — see VOICE-PB-2 plan; per-photo runs the 4-model split test, orchestration is Sonnet
Stage 8 (HS sync) — Haiku for batch create; Sonnet if any deal needs deduplication judgment

Opus role: orchestrator + verifier + Henry-gate. Never executes mechanical work directly. First-instance novel work (e.g. building this skill v1) is Opus; once a pattern is proven it’s a popup-selectable mechanical dispatch.

Why this skill exists

Every state’s “fresh active deals” run before this followed the same 7 stages but as ad-hoc commands. Henry repeatedly called out:

“This is the same process — should be a skill.” → V1 ships now.
“Stop being assumptive.” → Audit logic now distinguishes real street addresses (parsed from description, geocoded) from marketing-title strings (“Subto Rental Opportunity”, “100k Spread Daytona Flip”) that look populated but aren’t usable.
“Phone/email numbers are stale if Gemma hasn’t run with current prompts.” → Audit reports llm_extraction_method distribution explicitly so contact-coverage % isn’t misleading.
“InvestorLift Scraping — ALWAYS Via AWS Mac.” → All IL API calls (list AND detail) route through aws-mac. VPS is CloudFront-blocked.

Lessons embedded (from FL run 2026-05-05)

Lesson	Where it shows up
`address` column being non-null ≠ usable address	`audit.js` checks `address_from_google OR address_from_osm` for real coverage; flags marketing titles
`contact_phone` being populated ≠ Gemma-extracted	`audit.js` reports `llm_extraction_method` distribution; `(null)` count = “stale extraction, redo”
`has_photos` boolean is unreliable across pipeline	`audit.js` checks `drive_folder_url` instead
IL list API has 1000-row default page on Supabase	Pull always uses `count: 'exact'` and chunks `.in()` queries by 100
aws-mac may need Tailscale SSH re-auth periodically	Skill probes SSH first; surfaces auth URL if blocked
Home Mac Ultra (`100.123.248.46`) frequently down	Skill defaults to `aws-mac` (`100.90.38.109`) for all IL calls

Modes

# AUDIT ONLY — fresh API pull + honest gap report. No writes.
/state-active-deals-pipeline audit FL
/state-active-deals-pipeline audit CA
# Output: /tmp/{STATE}-active-audit.json with per-gap raw_id arrays
 
# RUN — gated execution of all 7 stages
/state-active-deals-pipeline run FL
/state-active-deals-pipeline run FL --gaps=insert,gemma,geocode,photos,vision,hs
# Each stage prompts for confirmation before live writes
 
# RERUN — same as RUN but with --force on all enrichment (Gemma re-extracts everything,
# photos re-pull, vision re-analyze). Use when prompts/methods changed.
/state-active-deals-pipeline rerun FL

The 7 stages

Stage 0 — Probe (always runs, no writes)

Verify aws-mac SSH (ssh aws-mac echo OK); if blocked, surface Tailscale auth URL
Verify IL cookie exists on aws-mac (/Users/ec2-user/openclaw/workspace/data/investorlift-cookies-raw.txt)
Verify cookie age (warn if >7 days; refresh via investorlift-refresh-cookies.js on Mac)

Stage 1 — Fresh IL list API pull (via aws-mac)

SSH aws-mac → curl IL list API with cookie → save to /tmp/investorlift-all-deals.json on Mac
scp back to VPS at /tmp/investorlift-all-deals.json
Filter to --state=XX subset, save at /tmp/il-{state}-fresh.json
Report: total deals fetched, state subset count, breakdown by county

Stage 2 — Honest audit (always runs after Stage 1)

Cross-reference fresh state raw_ids against acquisition_deals
For each metric, distinguish “column populated” from “really populated”:
- Real address coverage = deals with address_from_google IS NOT NULL OR address_from_osm IS NOT NULL (NOT just address IS NOT NULL)
- Gemma coverage = deals with llm_extraction_method IS NOT NULL (NOT contact_phone presence)
- Photo coverage = deals with drive_folder_url IS NOT NULL (NOT has_photos boolean)
- HS coverage = deals with hubspot_deal_id IS NOT NULL
Sample 3-5 records from each gap so Henry sees raw data, not just %
Output: console report + /tmp/{state}-active-audit.json

Stage 3 — Sync to Supabase (light writes)

node sync-il-api-to-supabase.js --state=XX
Inserts new, updates existing, marks expired (whole-table sweep — known intended behavior)
Verify post-sync coverage = 100% via re-query

Stage 4 — Gemma re-enrichment (all 494 if rerun mode, gap-only if run mode)

node enrich-il-deals-detail-api.js --state=XX --use-gemma [--force]
Sequential, IL rate-limited 1.5s/call → ~12 min per 100 deals
Refreshes: contact_phone, contact_name, contact_email, address_from_desc, financing_type, seller_motivation, closing_timeline, outreach_hook, deal_strategies, key_facts, red_flags, llm_confidence, llm_extraction_method, llm_extracted_at
Cost: ~ $0.06/ MG e mmain p u t \to$ 0.0001/deal → ~$0.05 for 494 deals

Stage 5 — Google geocoding (only deals with real addresses post-Gemma)

For each deal where address_from_desc IS NOT NULL AND address_from_google IS NULL
Hit Google Geocoding API → write address_from_google + lat/lng updates
Cost: $5/1 Kre q u es t s \to$ 0.50 per 100 deals

Stage 6 — Photo pull via aws-mac

node il-folder-photos.js --bulk --state=XX (after MAC_HOST update from 100.123.248.46 → aws-mac)
SSH to aws-mac → fetch IL property HTML with cookie → extract S3 sendlift image URLs → download to /tmp/il-photos/{deal_id}/
Upload to GDrive (acq+dispo shared folder) → write drive_folder_url to acquisition_deals
~10-30 photos per deal, ~3-4 hours for 494 deals (rate-limited)

Stage 7 — Vision analysis (NEW — uses VOICE-PB-3 batch script when ready)

For each photo: 4-model split test on N stratified sample, then production winner on rest
Models per VOICE-PB-2 plan: Gemini 2.0 Flash, Gemma 3 27B, Qwen2.5-VL 72B, Qwen3-VL 235B
Output: property_photos rows + acquisition_deals rollup columns
Cost: ~$0.40 per 1K photos on Gemini 2.0 Flash

Stage 8 — HubSpot deal+contact sync

For each deal where hubspot_deal_id IS NULL OR --force
Calls hubspot-deal-creator.js via /hubspot-deal-ingest skill
Creates/updates HS deal + finds-or-creates contact, associates them
Writes hubspot_deal_id + hubspot_contact_ids back to acquisition_deals

Composition with other skills

Step	Calls into	Skill or script
1	aws-mac IL pull	`workspace/scripts/investorlift-daily-sync.sh` (curl portion)
3	Sync	`workspace/scripts/sync-il-api-to-supabase.js`
4	Gemma enrich	`workspace/scripts/enrich-il-deals-detail-api.js` (uses `lib/gemma-deal-extractor.js`)
6	Photos	`workspace/scripts/il-folder-photos.js` (needs aws-mac MAC_HOST update)
7	Vision	NEW batch script (VOICE-PB-3, in progress)
8	HubSpot	`/hubspot-deal-ingest` skill → `workspace/scripts/hubspot-deal-creator.js`

Failure modes + recovery

Symptom	Cause	Recovery
`Tailscale SSH requires an additional check`	aws-mac auth expired	Henry visits the printed `https://login.tailscale.com/a/...` URL once
IL API 403 from VPS	CloudFront IP block	Always route via aws-mac. Per CLAUDE.md hard rule.
IL cookie stale (>7 days)	Cookie expired	Run `investorlift-refresh-cookies.js` on aws-mac (Playwright auto-refresh)
`relation "acquisition_deals" does not exist`	Wrong Supabase project	Verify SUPABASE_URL = CCP `svueekfvfrvhylxygktb`
Gemma enrich 0/N progress	Portkey proxy down OR `pc-opencl-aaae2d` config missing	`systemctl --user status portkey-proxy` + verify config
Photos pull SSH timeout	Mac unreachable	Confirm aws-mac (NOT 100.123.248.46) — see Lessons embedded
HS deal create 401	HUBSPOT_PAT rotated	Refresh PAT, update master.env

Cost envelope (per state, assuming ~500 active deals)

Stage	One-time	Recurring (per refresh)
1-3 (pull, audit, sync)	$0	$0
4 (Gemma enrich, ~$0.0001/deal)	$0.05	$0.05 (delta)
5 (Google geocode, $5/1K)	$0.50	$0.50 (delta)
6 (photos via Mac, infra cost only)	~$0	~$0
7 (vision, ~$0.40/1K on Gemini Flash)	$4 (10K photos)	$0.40 (delta)
8 (HS, free)	$0	$0
Total	~$5	~$1

Output artifacts

/tmp/investorlift-all-deals.json — full IL API response (all states)
/tmp/il-{state}-fresh.json — state-filtered subset with metadata
/tmp/{state}-active-audit.json — per-gap raw_id arrays + counts
~/.openclaw/workspace/reports/state-pipeline-{state}-YYYY-MM-DD.md — final report

Memory cross-references

feedback_audit_before_architect.md — sample raw data before claiming
feedback_no_assumptions.md — verify column meanings, don’t infer
feedback_live_over_memory.md — live API > Supabase cache for “current state”
feedback_il_enrichment_runs_on_mac_ultra.md — IL calls always via Mac
reference_il_marketplace_pipeline.md — IL endpoint discovery
project_voice_comps_skill_2026-05-05.md — VOICE-PB-2 (vision analysis spec used in Stage 7)
CLAUDE.md hard rule: “InvestorLift Scraping — ALWAYS Via AWS Mac”

Trigger keywords

User says any of:

“fresh pull {state} active deals”
“redo {state} pipeline”
“audit {state} active deals”
“{state} active deals pipeline”
“refresh active deals end to end”
“rerun {state} from scratch”

Quartz 4

Explorer

SKILL