VENDOR-SCORECARD — Messaging Vendor Evaluation Scorecard v1
Purpose: This document defines the scoring rubric used to rank messaging vendors after Phase 1 smoke tests and Phase 2 split tests. It is the canonical reference for the
tools/vendor-scorecard-generator.jsscript. Henry uses this scorecard to make the Phase 3 go/no-go decision.
Scoring overview
Each vendor receives a score from 0 to 100. The score is a weighted average of 8 categories. A vendor must score >= 65 AND pass all 11 gate checks AND trigger no rollback events to advance.
Scores are computed twice: once after Phase 1 (smoke-test scores, small N) and once after Phase 2 (split-test scores, full N). Phase 1 scores are directional only. Phase 2 scores are the ones that matter.
Category weights and measurement
| # | Category | Weight | Data source | How measured |
|---|---|---|---|---|
| 1 | Delivered rate | 20% | messaging_delivery_events | delivered / attempted over Phase 2 window |
| 2 | Reply rate | 15% | messaging_inbound_messages | Inbound messages within 72h / delivered |
| 3 | Qualified reply rate | 20% | omni_events (intent tagged by acq agent) | cash_buyer_interested intents within 72h / delivered |
| 4 | Cost per qualified reply | 20% | messaging_outbound_messages.cost_cents + qualified replies | Invoice cost / qualified replies (inverted: lower cost = higher score) |
| 5 | Compliance tooling | 10% | Manual audit + gate check evidence | 0-10 point subscale |
| 6 | Webhook reliability | 5% | webhook_audit_log.occurred_at vs messaging_delivery_events.occurred_at | % events arriving within 60s of send |
| 7 | Support responsiveness | 5% | Manual tracking | Median first-response hours (lower = better) |
| 8 | Exportability | 5% | Manual audit | 0-10 point subscale |
Total: 100 points. Minimum passing score: 65.
Category 1: Delivered rate (20 points)
Formula: (delivered / attempted) * 100
Scoring scale:
| Delivered rate | Points |
|---|---|
| >= 97% | 20 |
| 95-96.9% | 18 |
| 92-94.9% | 15 |
| 90-91.9% | 12 |
| 85-89.9% | 8 |
| < 85% | 0 (hard floor, vendor fails regardless of other scores) |
Data source: messaging_delivery_events.status = 'delivered' vs messaging_outbound_messages count for the vendor in the split-test run.
Notes: The 85% hard floor exists because below this threshold, delivery issues are likely systemic (carrier filtering, phone number reputation, or 10DLC compliance issues) rather than statistical noise. A vendor below 85% delivery is not production-ready.
Category 2: Reply rate (15 points)
Formula: (inbound_within_72h / delivered) * 100
Scoring uses comparison vs SalesMsg control:
| Reply rate vs control | Points |
|---|---|
| > 20% better than control | 15 |
| 10-20% better | 13 |
| Within 10% of control (either direction) | 11 |
| 10-20% worse than control | 8 |
| > 20% worse than control | 5 |
| > 40% worse than control | 0 |
Rationale: Reply rate is measured relative to control because absolute reply rates vary by message content, timing, and audience. What matters for vendor selection is whether the vendor’s delivery affects engagement.
Category 3: Qualified reply rate (20 points)
Formula: (cash_buyer_interested_intents_72h / delivered) * 100
A qualified reply requires BOTH: (a) positive sentiment classification by acquisitions agent AND (b) cash_buyer_interested intent tag in omni_events.metadata. Scored relative to SalesMsg control, same scale as Category 2.
| Qualified reply rate vs control | Points |
|---|---|
| > 20% better than control | 20 |
| 10-20% better | 17 |
| Within 10% of control | 14 |
| 10-20% worse | 10 |
| > 20% worse | 5 |
| > 40% worse | 0 |
This is the most important category because it measures the full business outcome. A vendor that converts at the same rate with 60% lower cost wins.
Category 4: Cost per qualified reply (20 points)
Formula: vendor_cost_cents_total / qualified_replies (inverted: lower cost = higher score)
Baseline: SalesMsg cost per qualified reply from control group.
| Relative to SalesMsg cost per qualified reply | Points |
|---|---|
| > 50% cheaper | 20 |
| 30-50% cheaper | 17 |
| 10-30% cheaper | 14 |
| Within 10% (either direction) | 11 |
| 10-30% more expensive | 7 |
| > 30% more expensive | 3 |
| > 50% more expensive | 0 |
Cost inputs by vendor:
- SalesMsg: $0.022 * segments (Henry’s quoted rate, no per-carrier breakdown)
- Telnyx:
messaging_outbound_messages.cost_centsfrom DLR (per-message, per-carrier passthrough) - Bandwidth: same as Telnyx
- sent.dm: $0.015 per active contact (confirmed; contact = unique phone reached in billing period)
- Bird: blended from DLR cost field in webhook
sent.dm special calculation: sent.dm bills per active contact per month, not per segment. For the split test, active contact cost = $0.015 / (segments sent to that contact in the billing period). This must be normalized to a per-qualified-reply basis for the scorecard comparison.
Category 5: Compliance tooling (10 points)
Subscale (10 points total, manual audit):
| Sub-check | Max points | PASS criteria |
|---|---|---|
| Auto-STOP processing | 2 | Vendor auto-processes STOP on their side (not just ours); confirmed via G6 gate check evidence |
| 10DLC UX | 2 | Brand/campaign registration can be completed without manual support ticket; TCR status visible in portal |
| Opt-out reporting | 2 | Opt-out counts available in vendor dashboard or API |
| STOP confirmation message | 2 | Vendor sends automatic STOP confirmation to the opted-out contact |
| Suppression list export | 2 | Phone-level suppression list exportable via API or CSV in under 5 minutes |
Partial points: 1 point if the feature exists but requires a workaround. 0 if absent.
Category 6: Webhook reliability (5 points)
Formula: (events_arriving_within_60s / total_events) * 100
Measured across all DLR and inbound webhook events during Phase 2 window.
| % events within 60s | Points |
|---|---|
| >= 98% | 5 |
| 95-97.9% | 4 |
| 90-94.9% | 3 |
| 80-89.9% | 2 |
| < 80% | 0 |
Data source: webhook_audit_log.occurred_at vs messaging_delivery_events.occurred_at. The gap between when we logged the webhook arrival and the timestamp of the event itself.
Category 7: Support responsiveness (5 points)
Measured: median first-response time in hours for tickets opened during Phase 0B-1 onboarding.
| Median first response | Points |
|---|---|
| < 1 hour (live chat / Slack) | 5 |
| 1-4 hours | 4 |
| 4-8 hours | 3 |
| 8-24 hours | 2 |
| > 24 hours | 1 |
| No response or untracked | 0 |
How to track: log all support interactions in workspace/knowledge-base/<vendor>/SUPPORT-LOG.md during onboarding. Record: ticket type, opened timestamp, first response timestamp, resolution timestamp.
Category 8: Exportability (5 points)
Subscale (5 points total, manual audit):
| Sub-check | Max points | PASS criteria |
|---|---|---|
| Message history export | 1.5 | Full sent/received history exportable via API or CSV, no retention wall |
| Suppression list export | 1.5 | Phone-level opt-out list exportable, no 90-day expiry or per-page limit |
| Portable phone numbers | 1 | Numbers can be ported out to another carrier without vendor approval or fee |
| No lock-in contracts | 1 | No multi-year commitment required for the pricing tier tested |
Scorecard report format
The scorecard generator produces workspace/reports/split-test/<run-id>-results.md in this format:
## Vendor Scorecard — Phase 2 Split Test <run-id>
Generated: <timestamp>
Run period: <start> to <end>
Contacts per group: <N>
| Category | Weight | SalesMsg (control) | Telnyx | Bandwidth | ... |
|---|---|---|---|---|---|
| Delivered rate | 20% | <rate> (<pts> pts) | ... | ... |
| Reply rate | 15% | — (baseline) | ... | ... |
| Qualified reply rate | 20% | — (baseline) | ... | ... |
| Cost per QR | 20% | — (baseline) | ... | ... |
| Compliance tooling | 10% | <pts> | ... | ... |
| Webhook reliability | 5% | <rate> (<pts> pts) | ... | ... |
| Support response | 5% | <pts> | ... | ... |
| Exportability | 5% | <pts> | ... | ... |
| **TOTAL** | 100% | — | **<total>** | **<total>** | ... |
## Gate check status (all 11 required PASS)
...
## Rollback events (any = vendor disqualified from Phase 3)
...
## Recommendation
[Generated based on scores, gate status, and rollback events]
Advancement decision matrix
After Phase 2 results:
| Condition | Decision |
|---|---|
| Score >= 65 AND all gates PASS AND no rollback | Eligible for Phase 3 (Henry decides) |
| Score 55-64 AND all gates PASS AND no rollback | Present to Henry as borderline; detailed cost analysis required |
| Score < 55 OR any gate FAILED OR rollback fired | Drop vendor from consideration; document why in evidence packet |
Henry’s Approval D is required before Phase 3 production migration planning begins. The scorecard is an input to that decision, not a substitute for it.
Change log
| Date | Change | Author |
|---|---|---|
| 2026-04-24 | Initial document created from plan Section D | Claude Code |
Version: v1 Owner: Henry Hill Last updated: 2026-04-24 Sourced from: messaging-vendor-phase-0-1-2026-04-23 plan Section D