VENDOR-SCORECARD — Messaging Vendor Evaluation Scorecard v1

Purpose: This document defines the scoring rubric used to rank messaging vendors after Phase 1 smoke tests and Phase 2 split tests. It is the canonical reference for the tools/vendor-scorecard-generator.js script. Henry uses this scorecard to make the Phase 3 go/no-go decision.

Scoring overview

Each vendor receives a score from 0 to 100. The score is a weighted average of 8 categories. A vendor must score >= 65 AND pass all 11 gate checks AND trigger no rollback events to advance.

Scores are computed twice: once after Phase 1 (smoke-test scores, small N) and once after Phase 2 (split-test scores, full N). Phase 1 scores are directional only. Phase 2 scores are the ones that matter.

Category weights and measurement

#	Category	Weight	Data source	How measured
1	Delivered rate	20%	`messaging_delivery_events`	`delivered / attempted` over Phase 2 window
2	Reply rate	15%	`messaging_inbound_messages`	Inbound messages within 72h / delivered
3	Qualified reply rate	20%	`omni_events` (intent tagged by acq agent)	`cash_buyer_interested` intents within 72h / delivered
4	Cost per qualified reply	20%	`messaging_outbound_messages.cost_cents` + qualified replies	Invoice cost / qualified replies (inverted: lower cost = higher score)
5	Compliance tooling	10%	Manual audit + gate check evidence	0-10 point subscale
6	Webhook reliability	5%	`webhook_audit_log.occurred_at` vs `messaging_delivery_events.occurred_at`	% events arriving within 60s of send
7	Support responsiveness	5%	Manual tracking	Median first-response hours (lower = better)
8	Exportability	5%	Manual audit	0-10 point subscale

Total: 100 points. Minimum passing score: 65.

Category 1: Delivered rate (20 points)

Formula: (delivered / attempted) * 100

Scoring scale:

Delivered rate	Points
>= 97%	20
95-96.9%	18
92-94.9%	15
90-91.9%	12
85-89.9%	8
< 85%	0 (hard floor, vendor fails regardless of other scores)

Data source: messaging_delivery_events.status = 'delivered' vs messaging_outbound_messages count for the vendor in the split-test run.

Notes: The 85% hard floor exists because below this threshold, delivery issues are likely systemic (carrier filtering, phone number reputation, or 10DLC compliance issues) rather than statistical noise. A vendor below 85% delivery is not production-ready.

Category 2: Reply rate (15 points)

Formula: (inbound_within_72h / delivered) * 100

Scoring uses comparison vs SalesMsg control:

Reply rate vs control	Points
> 20% better than control	15
10-20% better	13
Within 10% of control (either direction)	11
10-20% worse than control	8
> 20% worse than control	5
> 40% worse than control	0

Rationale: Reply rate is measured relative to control because absolute reply rates vary by message content, timing, and audience. What matters for vendor selection is whether the vendor’s delivery affects engagement.

Category 3: Qualified reply rate (20 points)

Formula: (cash_buyer_interested_intents_72h / delivered) * 100

A qualified reply requires BOTH: (a) positive sentiment classification by acquisitions agent AND (b) cash_buyer_interested intent tag in omni_events.metadata. Scored relative to SalesMsg control, same scale as Category 2.

Qualified reply rate vs control	Points
> 20% better than control	20
10-20% better	17
Within 10% of control	14
10-20% worse	10
> 20% worse	5
> 40% worse	0

This is the most important category because it measures the full business outcome. A vendor that converts at the same rate with 60% lower cost wins.

Category 4: Cost per qualified reply (20 points)

Formula: vendor_cost_cents_total / qualified_replies (inverted: lower cost = higher score)

Baseline: SalesMsg cost per qualified reply from control group.

Relative to SalesMsg cost per qualified reply	Points
> 50% cheaper	20
30-50% cheaper	17
10-30% cheaper	14
Within 10% (either direction)	11
10-30% more expensive	7
> 30% more expensive	3
> 50% more expensive	0

Cost inputs by vendor:

SalesMsg: $0.022 * segments (Henry’s quoted rate, no per-carrier breakdown)
Telnyx: messaging_outbound_messages.cost_cents from DLR (per-message, per-carrier passthrough)
Bandwidth: same as Telnyx
sent.dm: $0.015 per active contact (confirmed; contact = unique phone reached in billing period)
Bird: blended from DLR cost field in webhook

sent.dm special calculation: sent.dm bills per active contact per month, not per segment. For the split test, active contact cost = $0.015 / (segments sent to that contact in the billing period). This must be normalized to a per-qualified-reply basis for the scorecard comparison.

Category 5: Compliance tooling (10 points)

Subscale (10 points total, manual audit):

Sub-check	Max points	PASS criteria
Auto-STOP processing	2	Vendor auto-processes STOP on their side (not just ours); confirmed via G6 gate check evidence
10DLC UX	2	Brand/campaign registration can be completed without manual support ticket; TCR status visible in portal
Opt-out reporting	2	Opt-out counts available in vendor dashboard or API
STOP confirmation message	2	Vendor sends automatic STOP confirmation to the opted-out contact
Suppression list export	2	Phone-level suppression list exportable via API or CSV in under 5 minutes

Partial points: 1 point if the feature exists but requires a workaround. 0 if absent.

Category 6: Webhook reliability (5 points)

Formula: (events_arriving_within_60s / total_events) * 100

Measured across all DLR and inbound webhook events during Phase 2 window.

% events within 60s	Points
>= 98%	5
95-97.9%	4
90-94.9%	3
80-89.9%	2
< 80%	0

Data source: webhook_audit_log.occurred_at vs messaging_delivery_events.occurred_at. The gap between when we logged the webhook arrival and the timestamp of the event itself.

Category 7: Support responsiveness (5 points)

Measured: median first-response time in hours for tickets opened during Phase 0B-1 onboarding.

Median first response	Points
< 1 hour (live chat / Slack)	5
1-4 hours	4
4-8 hours	3
8-24 hours	2
> 24 hours	1
No response or untracked	0

How to track: log all support interactions in workspace/knowledge-base/<vendor>/SUPPORT-LOG.md during onboarding. Record: ticket type, opened timestamp, first response timestamp, resolution timestamp.

Category 8: Exportability (5 points)

Subscale (5 points total, manual audit):

Sub-check	Max points	PASS criteria
Message history export	1.5	Full sent/received history exportable via API or CSV, no retention wall
Suppression list export	1.5	Phone-level opt-out list exportable, no 90-day expiry or per-page limit
Portable phone numbers	1	Numbers can be ported out to another carrier without vendor approval or fee
No lock-in contracts	1	No multi-year commitment required for the pricing tier tested

Scorecard report format

The scorecard generator produces workspace/reports/split-test/<run-id>-results.md in this format:

## Vendor Scorecard — Phase 2 Split Test <run-id>
Generated: <timestamp>
Run period: <start> to <end>
Contacts per group: <N>

| Category | Weight | SalesMsg (control) | Telnyx | Bandwidth | ... |
|---|---|---|---|---|---|
| Delivered rate | 20% | <rate> (<pts> pts) | ... | ... |
| Reply rate | 15% | — (baseline) | ... | ... |
| Qualified reply rate | 20% | — (baseline) | ... | ... |
| Cost per QR | 20% | — (baseline) | ... | ... |
| Compliance tooling | 10% | <pts> | ... | ... |
| Webhook reliability | 5% | <rate> (<pts> pts) | ... | ... |
| Support response | 5% | <pts> | ... | ... |
| Exportability | 5% | <pts> | ... | ... |
| **TOTAL** | 100% | — | **<total>** | **<total>** | ... |

## Gate check status (all 11 required PASS)
...

## Rollback events (any = vendor disqualified from Phase 3)
...

## Recommendation
[Generated based on scores, gate status, and rollback events]

Advancement decision matrix

After Phase 2 results:

Condition	Decision
Score >= 65 AND all gates PASS AND no rollback	Eligible for Phase 3 (Henry decides)
Score 55-64 AND all gates PASS AND no rollback	Present to Henry as borderline; detailed cost analysis required
Score < 55 OR any gate FAILED OR rollback fired	Drop vendor from consideration; document why in evidence packet

Henry’s Approval D is required before Phase 3 production migration planning begins. The scorecard is an input to that decision, not a substitute for it.

Change log

Date	Change	Author
2026-04-24	Initial document created from plan Section D	Claude Code

Version: v1 Owner: Henry Hill Last updated: 2026-04-24 Sourced from: messaging-vendor-phase-0-1-2026-04-23 plan Section D

Quartz 4

Explorer

VENDOR-SCORECARD-v1