state-active-deals-pipeline
⚡ Execution-mode model routing (G-MODEL-ROUTING-AT-EXEC)
This skill is execution-mode. Per CLAUDE.md “MANDATORY: Execution-Mode Model Routing” + Henry directive 2026-05-05 (“always have a popup question box asking what model to use, with recommendations + why, not just Opus/Sonnet defaults”):
Step 1 of every invocation: Invoke the /model-selector skill BEFORE dispatching any subagent. It pops the AskUserQuestion box with candidate models (Gemma 26B / Gemma 31B / Haiku / Sonnet / Kimi / Opus) ranked by fit for the stage, with cost / latency / “why” per option. Henry picks; that model is then used for the dispatch.
Per stage recommendations the popup should default to (Henry overrides):
- Stage 1 (IL pull via aws-mac) — Haiku (mechanical SSH+curl, no reasoning)
- Stage 2 (audit) — Sonnet (need to interpret column meanings, sample data)
- Stage 3 (sync) — Haiku (mechanical script run)
- Stage 4 (Gemma enrich) — Gemma 26B for the per-deal extraction (already wired in
enrich-il-deals-detail-api.js --use-gemma); Sonnet for the orchestration shell - Stage 5 (Google geocode) — Haiku (mechanical API loop)
- Stage 6 (photo pull) — Haiku for orchestration; the IL HTML extraction is Mac-side
- Stage 7 (vision analysis) — see VOICE-PB-2 plan; per-photo runs the 4-model split test, orchestration is Sonnet
- Stage 8 (HS sync) — Haiku for batch create; Sonnet if any deal needs deduplication judgment
Opus role: orchestrator + verifier + Henry-gate. Never executes mechanical work directly. First-instance novel work (e.g. building this skill v1) is Opus; once a pattern is proven it’s a popup-selectable mechanical dispatch.
Why this skill exists
Every state’s “fresh active deals” run before this followed the same 7 stages but as ad-hoc commands. Henry repeatedly called out:
- “This is the same process — should be a skill.” → V1 ships now.
- “Stop being assumptive.” → Audit logic now distinguishes real street addresses (parsed from description, geocoded) from marketing-title strings (“Subto Rental Opportunity”, “100k Spread Daytona Flip”) that look populated but aren’t usable.
- “Phone/email numbers are stale if Gemma hasn’t run with current prompts.” → Audit reports
llm_extraction_methoddistribution explicitly so contact-coverage % isn’t misleading. - “InvestorLift Scraping — ALWAYS Via AWS Mac.” → All IL API calls (list AND detail) route through aws-mac. VPS is CloudFront-blocked.
Lessons embedded (from FL run 2026-05-05)
| Lesson | Where it shows up |
|---|---|
address column being non-null ≠ usable address | audit.js checks address_from_google OR address_from_osm for real coverage; flags marketing titles |
contact_phone being populated ≠ Gemma-extracted | audit.js reports llm_extraction_method distribution; (null) count = “stale extraction, redo” |
has_photos boolean is unreliable across pipeline | audit.js checks drive_folder_url instead |
| IL list API has 1000-row default page on Supabase | Pull always uses count: 'exact' and chunks .in() queries by 100 |
| aws-mac may need Tailscale SSH re-auth periodically | Skill probes SSH first; surfaces auth URL if blocked |
Home Mac Ultra (100.123.248.46) frequently down | Skill defaults to aws-mac (100.90.38.109) for all IL calls |
Modes
# AUDIT ONLY — fresh API pull + honest gap report. No writes.
/state-active-deals-pipeline audit FL
/state-active-deals-pipeline audit CA
# Output: /tmp/{STATE}-active-audit.json with per-gap raw_id arrays
# RUN — gated execution of all 7 stages
/state-active-deals-pipeline run FL
/state-active-deals-pipeline run FL --gaps=insert,gemma,geocode,photos,vision,hs
# Each stage prompts for confirmation before live writes
# RERUN — same as RUN but with --force on all enrichment (Gemma re-extracts everything,
# photos re-pull, vision re-analyze). Use when prompts/methods changed.
/state-active-deals-pipeline rerun FLThe 7 stages
Stage 0 — Probe (always runs, no writes)
- Verify aws-mac SSH (
ssh aws-mac echo OK); if blocked, surface Tailscale auth URL - Verify IL cookie exists on aws-mac (
/Users/ec2-user/openclaw/workspace/data/investorlift-cookies-raw.txt) - Verify cookie age (warn if >7 days; refresh via
investorlift-refresh-cookies.json Mac)
Stage 1 — Fresh IL list API pull (via aws-mac)
- SSH aws-mac → curl IL list API with cookie → save to
/tmp/investorlift-all-deals.jsonon Mac - scp back to VPS at
/tmp/investorlift-all-deals.json - Filter to
--state=XXsubset, save at/tmp/il-{state}-fresh.json - Report: total deals fetched, state subset count, breakdown by county
Stage 2 — Honest audit (always runs after Stage 1)
- Cross-reference fresh state raw_ids against
acquisition_deals - For each metric, distinguish “column populated” from “really populated”:
- Real address coverage = deals with
address_from_google IS NOT NULL OR address_from_osm IS NOT NULL(NOT justaddress IS NOT NULL) - Gemma coverage = deals with
llm_extraction_method IS NOT NULL(NOT contact_phone presence) - Photo coverage = deals with
drive_folder_url IS NOT NULL(NOThas_photosboolean) - HS coverage = deals with
hubspot_deal_id IS NOT NULL
- Real address coverage = deals with
- Sample 3-5 records from each gap so Henry sees raw data, not just %
- Output: console report +
/tmp/{state}-active-audit.json
Stage 3 — Sync to Supabase (light writes)
node sync-il-api-to-supabase.js --state=XX- Inserts new, updates existing, marks expired (whole-table sweep — known intended behavior)
- Verify post-sync coverage = 100% via re-query
Stage 4 — Gemma re-enrichment (all 494 if rerun mode, gap-only if run mode)
node enrich-il-deals-detail-api.js --state=XX --use-gemma [--force]- Sequential, IL rate-limited 1.5s/call → ~12 min per 100 deals
- Refreshes: contact_phone, contact_name, contact_email, address_from_desc, financing_type, seller_motivation, closing_timeline, outreach_hook, deal_strategies, key_facts, red_flags, llm_confidence, llm_extraction_method, llm_extracted_at
- Cost: ~0.0001/deal → ~$0.05 for 494 deals
Stage 5 — Google geocoding (only deals with real addresses post-Gemma)
- For each deal where
address_from_desc IS NOT NULLANDaddress_from_google IS NULL - Hit Google Geocoding API → write
address_from_google+ lat/lng updates - Cost: 0.50 per 100 deals
Stage 6 — Photo pull via aws-mac
node il-folder-photos.js --bulk --state=XX(after MAC_HOST update from100.123.248.46→aws-mac)- SSH to aws-mac → fetch IL property HTML with cookie → extract S3 sendlift image URLs → download to
/tmp/il-photos/{deal_id}/ - Upload to GDrive (acq+dispo shared folder) → write
drive_folder_urlto acquisition_deals - ~10-30 photos per deal, ~3-4 hours for 494 deals (rate-limited)
Stage 7 — Vision analysis (NEW — uses VOICE-PB-3 batch script when ready)
- For each photo: 4-model split test on N stratified sample, then production winner on rest
- Models per VOICE-PB-2 plan: Gemini 2.0 Flash, Gemma 3 27B, Qwen2.5-VL 72B, Qwen3-VL 235B
- Output:
property_photosrows +acquisition_dealsrollup columns - Cost: ~$0.40 per 1K photos on Gemini 2.0 Flash
Stage 8 — HubSpot deal+contact sync
- For each deal where
hubspot_deal_id IS NULLOR--force - Calls
hubspot-deal-creator.jsvia/hubspot-deal-ingestskill - Creates/updates HS deal + finds-or-creates contact, associates them
- Writes
hubspot_deal_id+hubspot_contact_idsback toacquisition_deals
Composition with other skills
| Step | Calls into | Skill or script |
|---|---|---|
| 1 | aws-mac IL pull | workspace/scripts/investorlift-daily-sync.sh (curl portion) |
| 3 | Sync | workspace/scripts/sync-il-api-to-supabase.js |
| 4 | Gemma enrich | workspace/scripts/enrich-il-deals-detail-api.js (uses lib/gemma-deal-extractor.js) |
| 6 | Photos | workspace/scripts/il-folder-photos.js (needs aws-mac MAC_HOST update) |
| 7 | Vision | NEW batch script (VOICE-PB-3, in progress) |
| 8 | HubSpot | /hubspot-deal-ingest skill → workspace/scripts/hubspot-deal-creator.js |
Failure modes + recovery
| Symptom | Cause | Recovery |
|---|---|---|
Tailscale SSH requires an additional check | aws-mac auth expired | Henry visits the printed https://login.tailscale.com/a/... URL once |
| IL API 403 from VPS | CloudFront IP block | Always route via aws-mac. Per CLAUDE.md hard rule. |
| IL cookie stale (>7 days) | Cookie expired | Run investorlift-refresh-cookies.js on aws-mac (Playwright auto-refresh) |
relation "acquisition_deals" does not exist | Wrong Supabase project | Verify SUPABASE_URL = CCP svueekfvfrvhylxygktb |
| Gemma enrich 0/N progress | Portkey proxy down OR pc-opencl-aaae2d config missing | systemctl --user status portkey-proxy + verify config |
| Photos pull SSH timeout | Mac unreachable | Confirm aws-mac (NOT 100.123.248.46) — see Lessons embedded |
| HS deal create 401 | HUBSPOT_PAT rotated | Refresh PAT, update master.env |
Cost envelope (per state, assuming ~500 active deals)
| Stage | One-time | Recurring (per refresh) |
|---|---|---|
| 1-3 (pull, audit, sync) | $0 | $0 |
| 4 (Gemma enrich, ~$0.0001/deal) | $0.05 | $0.05 (delta) |
| 5 (Google geocode, $5/1K) | $0.50 | $0.50 (delta) |
| 6 (photos via Mac, infra cost only) | ~$0 | ~$0 |
| 7 (vision, ~$0.40/1K on Gemini Flash) | $4 (10K photos) | $0.40 (delta) |
| 8 (HS, free) | $0 | $0 |
| Total | ~$5 | ~$1 |
Output artifacts
/tmp/investorlift-all-deals.json— full IL API response (all states)/tmp/il-{state}-fresh.json— state-filtered subset with metadata/tmp/{state}-active-audit.json— per-gap raw_id arrays + counts~/.openclaw/workspace/reports/state-pipeline-{state}-YYYY-MM-DD.md— final report
Memory cross-references
feedback_audit_before_architect.md— sample raw data before claimingfeedback_no_assumptions.md— verify column meanings, don’t inferfeedback_live_over_memory.md— live API > Supabase cache for “current state”feedback_il_enrichment_runs_on_mac_ultra.md— IL calls always via Macreference_il_marketplace_pipeline.md— IL endpoint discoveryproject_voice_comps_skill_2026-05-05.md— VOICE-PB-2 (vision analysis spec used in Stage 7)- CLAUDE.md hard rule: “InvestorLift Scraping — ALWAYS Via AWS Mac”
Trigger keywords
User says any of:
- “fresh pull {state} active deals”
- “redo {state} pipeline”
- “audit {state} active deals”
- “{state} active deals pipeline”
- “refresh active deals end to end”
- “rerun {state} from scratch”