Bulk SMS Split Test

Purpose

Runs a controlled, pre-registered split test comparing multiple SMS vendors against each other and a SalesMsg control. Enforces the plan’s no-p-hacking design: deterministic group assignment via sha256(phone + run_id) % N_groups, shared invariants (template, state, segment, send window), primary metric cost-per-qualified-reply with Bonferroni correction across vendor-vs-control contrasts.

No live outbound is permitted without Phase 1 evidence packets green per NO-SEND-GATE-v1.md.

When to use

After Approval C (Phase 1 smoke tests complete, vendors identified as finalists)
When running a scheduled split-test iteration on a new campaign type or segment
In dry-run mode to validate group assignment + compliance gate logic without any sends

Do NOT use for:

Ad-hoc bulk blasts outside a pre-registered test design
Production marketing campaigns (those use smart-outreach-worker.js)

Inputs

Flag	Type	Required	Description
`--run-id`	uuid	yes	Unique per run; used for group hashing + scorecard filename
`--campaign-type`	string	yes	e.g. `cash_buyer_rebroadcast`
`--state`	string	yes	US state filter, e.g. `CA`
`--segment`	string	yes	e.g. `cash_buyer_warm`
`--template-id`	uuid	yes	Pre-approved template to render per send
`--n-per-group`	int	yes	Target group size (2500 to 5000 recommended)
`--send-window`	ISO8601	yes	Start of send burst
`--vendors`	comma-list	yes	Vendor names, e.g. `telnyx,sent_dm,bandwidth`
`--dry-run`	enum	no (default plan)	One of: `plan`, `sandbox`, `live`. `plan` outputs assignment only. `sandbox` sends via each vendor’s sandbox. `live` is real traffic.

Outputs

One row per send in messaging_outbound_messages tagged with split_test_run_id + split_test_group
Post-run scorecard at workspace/reports/split-tests/<run_id>.md including delivery rate, reply rate, qualified-reply rate, cost per qualified reply per group, winner identified with p-value
Audit log of blocked-by-compliance-gate contacts at workspace/reports/split-tests/<run_id>-blocked.jsonl
Discord #ops summary post on scorecard completion

Acceptance tests

Group assignment is deterministic: sha256(phone+run_id) % N same input yields same group every time
No contact appears in more than one group within a single run
Every send passes the compliance gate BEFORE the vendor API call; blocked contacts logged with reason
--dry-run=plan produces assignment report without any API calls or DB writes
--dry-run=sandbox routes through each vendor’s sandbox endpoint (verified for sent.dm; NEEDS-IMPLEMENTATION for others)
--dry-run=live refuses to run unless Approval C is recorded (check for workspace/reports/vendor-evidence/<vendor>-phase1-*.md existence for all vendors in group)
On any rollback trigger firing mid-run, remaining queued sends are cancelled, already-sent sends are preserved, incident logged to workspace/reports/rollback-triggers/<date>-<trigger>-<vendor>.md
Scorecard generated at T+72h matches the weights in VENDOR-SCORECARD-v1.md

Rollback behavior

Mid-run abort: --abort --run-id=<uuid> flag stops queued sends, preserves sent rows, writes incident report
Post-run bad data: scorecard accepts a --exclude-group=<n> re-run flag to recompute metrics minus a contaminated group without re-sending
If vendor API rate-limits us mid-run: exponential backoff with ceiling 60s between retries; if sustained failure logs as rollback trigger

Plan section: Section C + G.3 in /home/opsadmin/.claude/plans/put-you-full-last-functional-sparrow.md
Implementation: /home/opsadmin/.openclaw/tools/bulk-sms-split-test.js
Calls: tools/messaging-compliance-gate.js per contact before send
Writes to: messaging_outbound_messages, messaging_inbound_messages (via webhook handler fan-out), messaging_delivery_events
Reads from: investorbase_buyers for contact population, messaging_suppression_events for filtering
Protocol doc: workspace/docs/SPLIT-TEST-PROTOCOL-v1.md
Scorecard doc: workspace/docs/VENDOR-SCORECARD-v1.md

Invokes / Invoked by

Invokes: compliance-gates, SKILL Invoked by: SKILL, SKILL

Quartz 4

Explorer

SKILL

Bulk SMS Split Test

Purpose

When to use

Inputs

Outputs

Acceptance tests

Rollback behavior

Invokes / Invoked by

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

SKILL

Bulk SMS Split Test

Purpose

When to use

Inputs

Outputs

Acceptance tests

Rollback behavior

Related files

Invokes / Invoked by

Graph View

Table of Contents

Backlinks