← all hypotheses

Copilot Promise Ledger for SaaS Support Ops

graduated [TRIANGULATED] filter 11.0/15 spread ±3.0 signals: 2 independent
What is this?
When a SaaS support team deploys Intercom Fin, Decagon, Sierra, or Ada, the vendor's dashboard reports a rosy auto-resolution rate. What the head of support ops cannot independently see is whether the copilot's customer-facing commitments — 'we'll fix this by Friday', 'I've escalated this', 'refund processed' — actually land in Zendesk reality. The vendor sells the headline number; the ops lead absorbs the SLA breaches and CSAT damage when promises don't hold. Copilot Promise Ledger is a thin overlay on Zendesk/Intercom/Front: the ops lead flags copilot replies containing commitments (or imports the vendor's reply category), and the ledger waits for the resolution event (ticket close, CSAT score, SLA breach log) and grades each promise against reality. After 4–8 weeks, the lead has a miss-pattern catalog by commitment type — informing renewal negotiations and prompt-tightening asks. AE is uniquely suited: the 6-pattern autopsy taxonomy clusters recurring failure modes (e.g. temporal blindness on date promises), the sub-24h grading loop matches the 3–14 day ticket resolution window, and ground truth lives in Zendesk — no rubric subjectivity, no internal trace access.
Why did we consider it?
As AI copilots become table stakes in SaaS support, an independent reality-graded ledger of copilot promises is the natural buyer-side instrument — and AE's prediction-grading taxonomy is already the right shape to build it.
What breaks?
  • Modern support agents use deterministic tool-calling rather than text-based promises, making the core extraction premise obsolete.
  • Verifying real-world resolutions requires deep integrations with external systems (Stripe, Jira), violating the 'NOT multi-tenant SaaS' and solo developer constraints.
  • The output is merely a vendor complaint log, lacking the actionable ROI needed to secure £100K-£300K ARR.
What did we learn?
Engine verdict: GATHER_MORE_SIGNAL (WORTH_SKIMMING). ⚠ 4 load-bearing contradiction(s) found. Real trust gap and clean AE-fit, but no observed buyer and episodic-vs-recurring tension unresolved — run Week 1 outbound before building.

Filter scores

Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.

AxisWhat it measures
data moatDoes this product accumulate proprietary data that compounds?
10x model testDoes a better model make this more valuable, or redundant?
fast feedback loopsCan outputs be graded against reality in <30 days?
solo founder feasibleCan a solo operator build and run this without a team?
AI providers cant eat itDo hyperscalers have structural reasons NOT to build this?
Composite median: 11.0 / 15. Graduation threshold: 9.0. IQR across runs: 3.0.

Evidence

Signal A — Primary source

M365 Copilot is designed as a general purpose tool to help workers digest information by summarizing emails, meetings, or documents, create new ...

Signal D — Demand proxy

{"found":true,"summary":"Hacker News discussion surfaces community frustration that Copilot broke audit logs and Microsoft lacks transparency about it, directly evidencing demand for independent copilot accountability tooling in enterprise workflows.","sources":["https://news.ycombinator.com/item?id=44957454"],"reason":"HN thread 'Copilot broke audit logs, but Microsoft won't tell customers' shows real practitioner concern about copilot observability and audit-trail integrity — the exact trust gap the Promise Ledger targets. Remaining results (Reddit SaaS feedback, Facebook vibe-coding post) a…

Evaluation history

WhenStagePhase
2026-05-13 08:34deep_council_verdictgraduated
2026-05-13 08:30deep_claude_takegraduated
2026-05-13 08:28deep_90day_plangraduated
2026-05-13 08:27deep_riskgraduated
2026-05-13 08:26deep_distributiongraduated
2026-05-13 08:24deep_pricinggraduated
2026-05-13 08:23deep_moatgraduated
2026-05-13 08:22deep_buyer_simgraduated
2026-05-13 08:21deep_icpgraduated
2026-05-13 08:20deep_competitorgraduated
2026-05-13 08:18deep_market_realitygraduated
2026-05-13 08:13filter_scorescored
2026-05-13 08:07filter_scorescored
2026-05-13 07:55filter_scorescored
2026-05-13 07:50filter_scorescored
2026-05-13 07:42filter_scorescored
2026-05-13 07:37filter_scorescored
2026-05-13 07:24filter_scorescored
2026-05-13 07:18filter_scorescored
2026-05-13 07:12evidence_searchargument
2026-05-13 01:30evidence_searchargument
2026-05-12 23:36evidence_searchargument
2026-05-12 21:48evidence_searchargument
2026-05-12 20:06evidence_searchargument
2026-05-12 18:06evidence_searchargument
2026-05-12 16:18evidence_searchargument
2026-05-12 14:24evidence_searchargument
2026-05-12 12:42evidence_searchargument
2026-05-12 10:48evidence_searchargument
2026-05-12 09:06evidence_searchargument
2026-05-12 07:18evidence_searchargument
2026-05-12 05:36evidence_searchargument
2026-05-11 19:36evidence_searchargument
2026-05-11 18:06evidence_searchargument
2026-05-11 16:36evidence_searchargument
2026-05-11 15:06evidence_searchargument
2026-05-11 13:36evidence_searchargument
2026-05-11 06:36evidence_searchargument
2026-05-11 05:24evidence_searchargument
2026-05-11 04:42evidence_searchargument
2026-05-11 04:18evidence_searchargument
2026-05-11 03:48evidence_searchargument
2026-05-11 03:18evidence_searchargument
2026-05-11 02:48evidence_searchargument
2026-05-11 01:00evidence_searchargument
2026-05-11 00:12evidence_searchargument
2026-05-11 00:06evidence_searchargument
2026-05-11 00:00audience_simulationargument
2026-05-10 23:54red_team_killargument
2026-05-10 23:48steelmanargument
2026-05-10 23:44genesisargument