Bulk SMS Split Test
Purpose
Runs a controlled, pre-registered split test comparing multiple SMS vendors against each other and a SalesMsg control. Enforces the plan’s no-p-hacking design: deterministic group assignment via sha256(phone + run_id) % N_groups, shared invariants (template, state, segment, send window), primary metric cost-per-qualified-reply with Bonferroni correction across vendor-vs-control contrasts.
No live outbound is permitted without Phase 1 evidence packets green per NO-SEND-GATE-v1.md.
When to use
- After Approval C (Phase 1 smoke tests complete, vendors identified as finalists)
- When running a scheduled split-test iteration on a new campaign type or segment
- In dry-run mode to validate group assignment + compliance gate logic without any sends
Do NOT use for:
- Ad-hoc bulk blasts outside a pre-registered test design
- Production marketing campaigns (those use
smart-outreach-worker.js)
Inputs
| Flag | Type | Required | Description |
|---|---|---|---|
--run-id | uuid | yes | Unique per run; used for group hashing + scorecard filename |
--campaign-type | string | yes | e.g. cash_buyer_rebroadcast |
--state | string | yes | US state filter, e.g. CA |
--segment | string | yes | e.g. cash_buyer_warm |
--template-id | uuid | yes | Pre-approved template to render per send |
--n-per-group | int | yes | Target group size (2500 to 5000 recommended) |
--send-window | ISO8601 | yes | Start of send burst |
--vendors | comma-list | yes | Vendor names, e.g. telnyx,sent_dm,bandwidth |
--dry-run | enum | no (default plan) | One of: plan, sandbox, live. plan outputs assignment only. sandbox sends via each vendor’s sandbox. live is real traffic. |
Outputs
- One row per send in
messaging_outbound_messagestagged withsplit_test_run_id+split_test_group - Post-run scorecard at
workspace/reports/split-tests/<run_id>.mdincluding delivery rate, reply rate, qualified-reply rate, cost per qualified reply per group, winner identified with p-value - Audit log of blocked-by-compliance-gate contacts at
workspace/reports/split-tests/<run_id>-blocked.jsonl - Discord
#opssummary post on scorecard completion
Acceptance tests
- Group assignment is deterministic:
sha256(phone+run_id) % Nsame input yields same group every time - No contact appears in more than one group within a single run
- Every send passes the compliance gate BEFORE the vendor API call; blocked contacts logged with reason
--dry-run=planproduces assignment report without any API calls or DB writes--dry-run=sandboxroutes through each vendor’s sandbox endpoint (verified for sent.dm; NEEDS-IMPLEMENTATION for others)--dry-run=liverefuses to run unless Approval C is recorded (check forworkspace/reports/vendor-evidence/<vendor>-phase1-*.mdexistence for all vendors in group)- On any rollback trigger firing mid-run, remaining queued sends are cancelled, already-sent sends are preserved, incident logged to
workspace/reports/rollback-triggers/<date>-<trigger>-<vendor>.md - Scorecard generated at T+72h matches the weights in VENDOR-SCORECARD-v1.md
Rollback behavior
- Mid-run abort:
--abort --run-id=<uuid>flag stops queued sends, preserves sent rows, writes incident report - Post-run bad data: scorecard accepts a
--exclude-group=<n>re-run flag to recompute metrics minus a contaminated group without re-sending - If vendor API rate-limits us mid-run: exponential backoff with ceiling 60s between retries; if sustained failure logs as rollback trigger
Related files
- Plan section: Section C + G.3 in
/home/opsadmin/.claude/plans/put-you-full-last-functional-sparrow.md - Implementation:
/home/opsadmin/.openclaw/tools/bulk-sms-split-test.js - Calls:
tools/messaging-compliance-gate.jsper contact before send - Writes to:
messaging_outbound_messages,messaging_inbound_messages(via webhook handler fan-out),messaging_delivery_events - Reads from:
investorbase_buyersfor contact population,messaging_suppression_eventsfor filtering - Protocol doc:
workspace/docs/SPLIT-TEST-PROTOCOL-v1.md - Scorecard doc:
workspace/docs/VENDOR-SCORECARD-v1.md
Invokes / Invoked by
Invokes: compliance-gates, SKILL Invoked by: SKILL, SKILL