VENDOR-SCORECARD — Messaging Vendor Evaluation Scorecard v1

Purpose: This document defines the scoring rubric used to rank messaging vendors after Phase 1 smoke tests and Phase 2 split tests. It is the canonical reference for the tools/vendor-scorecard-generator.js script. Henry uses this scorecard to make the Phase 3 go/no-go decision.


Scoring overview

Each vendor receives a score from 0 to 100. The score is a weighted average of 8 categories. A vendor must score >= 65 AND pass all 11 gate checks AND trigger no rollback events to advance.

Scores are computed twice: once after Phase 1 (smoke-test scores, small N) and once after Phase 2 (split-test scores, full N). Phase 1 scores are directional only. Phase 2 scores are the ones that matter.


Category weights and measurement

#CategoryWeightData sourceHow measured
1Delivered rate20%messaging_delivery_eventsdelivered / attempted over Phase 2 window
2Reply rate15%messaging_inbound_messagesInbound messages within 72h / delivered
3Qualified reply rate20%omni_events (intent tagged by acq agent)cash_buyer_interested intents within 72h / delivered
4Cost per qualified reply20%messaging_outbound_messages.cost_cents + qualified repliesInvoice cost / qualified replies (inverted: lower cost = higher score)
5Compliance tooling10%Manual audit + gate check evidence0-10 point subscale
6Webhook reliability5%webhook_audit_log.occurred_at vs messaging_delivery_events.occurred_at% events arriving within 60s of send
7Support responsiveness5%Manual trackingMedian first-response hours (lower = better)
8Exportability5%Manual audit0-10 point subscale

Total: 100 points. Minimum passing score: 65.


Category 1: Delivered rate (20 points)

Formula: (delivered / attempted) * 100

Scoring scale:

Delivered ratePoints
>= 97%20
95-96.9%18
92-94.9%15
90-91.9%12
85-89.9%8
< 85%0 (hard floor, vendor fails regardless of other scores)

Data source: messaging_delivery_events.status = 'delivered' vs messaging_outbound_messages count for the vendor in the split-test run.

Notes: The 85% hard floor exists because below this threshold, delivery issues are likely systemic (carrier filtering, phone number reputation, or 10DLC compliance issues) rather than statistical noise. A vendor below 85% delivery is not production-ready.


Category 2: Reply rate (15 points)

Formula: (inbound_within_72h / delivered) * 100

Scoring uses comparison vs SalesMsg control:

Reply rate vs controlPoints
> 20% better than control15
10-20% better13
Within 10% of control (either direction)11
10-20% worse than control8
> 20% worse than control5
> 40% worse than control0

Rationale: Reply rate is measured relative to control because absolute reply rates vary by message content, timing, and audience. What matters for vendor selection is whether the vendor’s delivery affects engagement.


Category 3: Qualified reply rate (20 points)

Formula: (cash_buyer_interested_intents_72h / delivered) * 100

A qualified reply requires BOTH: (a) positive sentiment classification by acquisitions agent AND (b) cash_buyer_interested intent tag in omni_events.metadata. Scored relative to SalesMsg control, same scale as Category 2.

Qualified reply rate vs controlPoints
> 20% better than control20
10-20% better17
Within 10% of control14
10-20% worse10
> 20% worse5
> 40% worse0

This is the most important category because it measures the full business outcome. A vendor that converts at the same rate with 60% lower cost wins.


Category 4: Cost per qualified reply (20 points)

Formula: vendor_cost_cents_total / qualified_replies (inverted: lower cost = higher score)

Baseline: SalesMsg cost per qualified reply from control group.

Relative to SalesMsg cost per qualified replyPoints
> 50% cheaper20
30-50% cheaper17
10-30% cheaper14
Within 10% (either direction)11
10-30% more expensive7
> 30% more expensive3
> 50% more expensive0

Cost inputs by vendor:

  • SalesMsg: $0.022 * segments (Henry’s quoted rate, no per-carrier breakdown)
  • Telnyx: messaging_outbound_messages.cost_cents from DLR (per-message, per-carrier passthrough)
  • Bandwidth: same as Telnyx
  • sent.dm: $0.015 per active contact (confirmed; contact = unique phone reached in billing period)
  • Bird: blended from DLR cost field in webhook

sent.dm special calculation: sent.dm bills per active contact per month, not per segment. For the split test, active contact cost = $0.015 / (segments sent to that contact in the billing period). This must be normalized to a per-qualified-reply basis for the scorecard comparison.


Category 5: Compliance tooling (10 points)

Subscale (10 points total, manual audit):

Sub-checkMax pointsPASS criteria
Auto-STOP processing2Vendor auto-processes STOP on their side (not just ours); confirmed via G6 gate check evidence
10DLC UX2Brand/campaign registration can be completed without manual support ticket; TCR status visible in portal
Opt-out reporting2Opt-out counts available in vendor dashboard or API
STOP confirmation message2Vendor sends automatic STOP confirmation to the opted-out contact
Suppression list export2Phone-level suppression list exportable via API or CSV in under 5 minutes

Partial points: 1 point if the feature exists but requires a workaround. 0 if absent.


Category 6: Webhook reliability (5 points)

Formula: (events_arriving_within_60s / total_events) * 100

Measured across all DLR and inbound webhook events during Phase 2 window.

% events within 60sPoints
>= 98%5
95-97.9%4
90-94.9%3
80-89.9%2
< 80%0

Data source: webhook_audit_log.occurred_at vs messaging_delivery_events.occurred_at. The gap between when we logged the webhook arrival and the timestamp of the event itself.


Category 7: Support responsiveness (5 points)

Measured: median first-response time in hours for tickets opened during Phase 0B-1 onboarding.

Median first responsePoints
< 1 hour (live chat / Slack)5
1-4 hours4
4-8 hours3
8-24 hours2
> 24 hours1
No response or untracked0

How to track: log all support interactions in workspace/knowledge-base/<vendor>/SUPPORT-LOG.md during onboarding. Record: ticket type, opened timestamp, first response timestamp, resolution timestamp.


Category 8: Exportability (5 points)

Subscale (5 points total, manual audit):

Sub-checkMax pointsPASS criteria
Message history export1.5Full sent/received history exportable via API or CSV, no retention wall
Suppression list export1.5Phone-level opt-out list exportable, no 90-day expiry or per-page limit
Portable phone numbers1Numbers can be ported out to another carrier without vendor approval or fee
No lock-in contracts1No multi-year commitment required for the pricing tier tested

Scorecard report format

The scorecard generator produces workspace/reports/split-test/<run-id>-results.md in this format:

## Vendor Scorecard — Phase 2 Split Test <run-id>
Generated: <timestamp>
Run period: <start> to <end>
Contacts per group: <N>

| Category | Weight | SalesMsg (control) | Telnyx | Bandwidth | ... |
|---|---|---|---|---|---|
| Delivered rate | 20% | <rate> (<pts> pts) | ... | ... |
| Reply rate | 15% | — (baseline) | ... | ... |
| Qualified reply rate | 20% | — (baseline) | ... | ... |
| Cost per QR | 20% | — (baseline) | ... | ... |
| Compliance tooling | 10% | <pts> | ... | ... |
| Webhook reliability | 5% | <rate> (<pts> pts) | ... | ... |
| Support response | 5% | <pts> | ... | ... |
| Exportability | 5% | <pts> | ... | ... |
| **TOTAL** | 100% | — | **<total>** | **<total>** | ... |

## Gate check status (all 11 required PASS)
...

## Rollback events (any = vendor disqualified from Phase 3)
...

## Recommendation
[Generated based on scores, gate status, and rollback events]

Advancement decision matrix

After Phase 2 results:

ConditionDecision
Score >= 65 AND all gates PASS AND no rollbackEligible for Phase 3 (Henry decides)
Score 55-64 AND all gates PASS AND no rollbackPresent to Henry as borderline; detailed cost analysis required
Score < 55 OR any gate FAILED OR rollback firedDrop vendor from consideration; document why in evidence packet

Henry’s Approval D is required before Phase 3 production migration planning begins. The scorecard is an input to that decision, not a substitute for it.


Change log

DateChangeAuthor
2026-04-24Initial document created from plan Section DClaude Code

Version: v1 Owner: Henry Hill Last updated: 2026-04-24 Sourced from: messaging-vendor-phase-0-1-2026-04-23 plan Section D