Bulk SMS Split Test

Purpose

Runs a controlled, pre-registered split test comparing multiple SMS vendors against each other and a SalesMsg control. Enforces the plan’s no-p-hacking design: deterministic group assignment via sha256(phone + run_id) % N_groups, shared invariants (template, state, segment, send window), primary metric cost-per-qualified-reply with Bonferroni correction across vendor-vs-control contrasts.

No live outbound is permitted without Phase 1 evidence packets green per NO-SEND-GATE-v1.md.

When to use

  • After Approval C (Phase 1 smoke tests complete, vendors identified as finalists)
  • When running a scheduled split-test iteration on a new campaign type or segment
  • In dry-run mode to validate group assignment + compliance gate logic without any sends

Do NOT use for:

  • Ad-hoc bulk blasts outside a pre-registered test design
  • Production marketing campaigns (those use smart-outreach-worker.js)

Inputs

FlagTypeRequiredDescription
--run-iduuidyesUnique per run; used for group hashing + scorecard filename
--campaign-typestringyese.g. cash_buyer_rebroadcast
--statestringyesUS state filter, e.g. CA
--segmentstringyese.g. cash_buyer_warm
--template-iduuidyesPre-approved template to render per send
--n-per-groupintyesTarget group size (2500 to 5000 recommended)
--send-windowISO8601yesStart of send burst
--vendorscomma-listyesVendor names, e.g. telnyx,sent_dm,bandwidth
--dry-runenumno (default plan)One of: plan, sandbox, live. plan outputs assignment only. sandbox sends via each vendor’s sandbox. live is real traffic.

Outputs

  • One row per send in messaging_outbound_messages tagged with split_test_run_id + split_test_group
  • Post-run scorecard at workspace/reports/split-tests/<run_id>.md including delivery rate, reply rate, qualified-reply rate, cost per qualified reply per group, winner identified with p-value
  • Audit log of blocked-by-compliance-gate contacts at workspace/reports/split-tests/<run_id>-blocked.jsonl
  • Discord #ops summary post on scorecard completion

Acceptance tests

  1. Group assignment is deterministic: sha256(phone+run_id) % N same input yields same group every time
  2. No contact appears in more than one group within a single run
  3. Every send passes the compliance gate BEFORE the vendor API call; blocked contacts logged with reason
  4. --dry-run=plan produces assignment report without any API calls or DB writes
  5. --dry-run=sandbox routes through each vendor’s sandbox endpoint (verified for sent.dm; NEEDS-IMPLEMENTATION for others)
  6. --dry-run=live refuses to run unless Approval C is recorded (check for workspace/reports/vendor-evidence/<vendor>-phase1-*.md existence for all vendors in group)
  7. On any rollback trigger firing mid-run, remaining queued sends are cancelled, already-sent sends are preserved, incident logged to workspace/reports/rollback-triggers/<date>-<trigger>-<vendor>.md
  8. Scorecard generated at T+72h matches the weights in VENDOR-SCORECARD-v1.md

Rollback behavior

  • Mid-run abort: --abort --run-id=<uuid> flag stops queued sends, preserves sent rows, writes incident report
  • Post-run bad data: scorecard accepts a --exclude-group=<n> re-run flag to recompute metrics minus a contaminated group without re-sending
  • If vendor API rate-limits us mid-run: exponential backoff with ceiling 60s between retries; if sustained failure logs as rollback trigger
  • Plan section: Section C + G.3 in /home/opsadmin/.claude/plans/put-you-full-last-functional-sparrow.md
  • Implementation: /home/opsadmin/.openclaw/tools/bulk-sms-split-test.js
  • Calls: tools/messaging-compliance-gate.js per contact before send
  • Writes to: messaging_outbound_messages, messaging_inbound_messages (via webhook handler fan-out), messaging_delivery_events
  • Reads from: investorbase_buyers for contact population, messaging_suppression_events for filtering
  • Protocol doc: workspace/docs/SPLIT-TEST-PROTOCOL-v1.md
  • Scorecard doc: workspace/docs/VENDOR-SCORECARD-v1.md

Invokes / Invoked by

Invokes: compliance-gates, SKILL Invoked by: SKILL, SKILL