Copilot Promise Ledger for SaaS Support Ops

graduated [TRIANGULATED] filter 11.0/15 spread ±3.0 signals: 2 independent

What is this?

When a SaaS support team deploys Intercom Fin, Decagon, Sierra, or Ada, the vendor's dashboard reports a rosy auto-resolution rate. What the head of support ops cannot independently see is whether the copilot's customer-facing commitments — 'we'll fix this by Friday', 'I've escalated this', 'refund processed' — actually land in Zendesk reality. The vendor sells the headline number; the ops lead absorbs the SLA breaches and CSAT damage when promises don't hold. Copilot Promise Ledger is a thin overlay on Zendesk/Intercom/Front: the ops lead flags copilot replies containing commitments (or imports the vendor's reply category), and the ledger waits for the resolution event (ticket close, CSAT score, SLA breach log) and grades each promise against reality. After 4–8 weeks, the lead has a miss-pattern catalog by commitment type — informing renewal negotiations and prompt-tightening asks. AE is uniquely suited: the 6-pattern autopsy taxonomy clusters recurring failure modes (e.g. temporal blindness on date promises), the sub-24h grading loop matches the 3–14 day ticket resolution window, and ground truth lives in Zendesk — no rubric subjectivity, no internal trace access.

Why did we consider it?

As AI copilots become table stakes in SaaS support, an independent reality-graded ledger of copilot promises is the natural buyer-side instrument — and AE's prediction-grading taxonomy is already the right shape to build it.

What breaks?

Modern support agents use deterministic tool-calling rather than text-based promises, making the core extraction premise obsolete.
Verifying real-world resolutions requires deep integrations with external systems (Stripe, Jira), violating the 'NOT multi-tenant SaaS' and solo developer constraints.
The output is merely a vendor complaint log, lacking the actionable ROI needed to secure £100K-£300K ARR.

What did we learn?

Engine verdict: GATHER_MORE_SIGNAL (WORTH_SKIMMING). ⚠ 4 load-bearing contradiction(s) found. Real trust gap and clean AE-fit, but no observed buyer and episodic-vs-recurring tension unresolved — run Week 1 outbound before building.

Filter scores

Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.

Axis	What it measures
data moat	Does this product accumulate proprietary data that compounds?
10x model test	Does a better model make this more valuable, or redundant?
fast feedback loops	Can outputs be graded against reality in <30 days?
solo founder feasible	Can a solo operator build and run this without a team?
AI providers cant eat it	Do hyperscalers have structural reasons NOT to build this?

Composite median: 11.0 / 15. Graduation threshold: 9.0. IQR across runs: 3.0.

Evidence

Signal A — Primary source

https://arxiv.org/pdf/2504.11443 credibility: medium

M365 Copilot is designed as a general purpose tool to help workers digest information by summarizing emails, meetings, or documents, create new ...

Signal D — Demand proxy

{"found":true,"summary":"Hacker News discussion surfaces community frustration that Copilot broke audit logs and Microsoft lacks transparency about it, directly evidencing demand for independent copilot accountability tooling in enterprise workflows.","sources":["https://news.ycombinator.com/item?id=44957454"],"reason":"HN thread 'Copilot broke audit logs, but Microsoft won't tell customers' shows real practitioner concern about copilot observability and audit-trail integrity — the exact trust gap the Promise Ledger targets. Remaining results (Reddit SaaS feedback, Facebook vibe-coding post) a…

Evaluation history

When	Stage	Phase
2026-05-13 08:34	deep_council_verdict	graduated
2026-05-13 08:30	deep_claude_take	graduated
2026-05-13 08:28	deep_90day_plan	graduated
2026-05-13 08:27	deep_risk	graduated
2026-05-13 08:26	deep_distribution	graduated
2026-05-13 08:24	deep_pricing	graduated
2026-05-13 08:23	deep_moat	graduated
2026-05-13 08:22	deep_buyer_sim	graduated
2026-05-13 08:21	deep_icp	graduated
2026-05-13 08:20	deep_competitor	graduated
2026-05-13 08:18	deep_market_reality	graduated
2026-05-13 08:13	filter_score	scored
2026-05-13 08:07	filter_score	scored
2026-05-13 07:55	filter_score	scored
2026-05-13 07:50	filter_score	scored
2026-05-13 07:42	filter_score	scored
2026-05-13 07:37	filter_score	scored
2026-05-13 07:24	filter_score	scored
2026-05-13 07:18	filter_score	scored
2026-05-13 07:12	evidence_search	argument
2026-05-13 01:30	evidence_search	argument
2026-05-12 23:36	evidence_search	argument
2026-05-12 21:48	evidence_search	argument
2026-05-12 20:06	evidence_search	argument
2026-05-12 18:06	evidence_search	argument
2026-05-12 16:18	evidence_search	argument
2026-05-12 14:24	evidence_search	argument
2026-05-12 12:42	evidence_search	argument
2026-05-12 10:48	evidence_search	argument
2026-05-12 09:06	evidence_search	argument
2026-05-12 07:18	evidence_search	argument
2026-05-12 05:36	evidence_search	argument
2026-05-11 19:36	evidence_search	argument
2026-05-11 18:06	evidence_search	argument
2026-05-11 16:36	evidence_search	argument
2026-05-11 15:06	evidence_search	argument
2026-05-11 13:36	evidence_search	argument
2026-05-11 06:36	evidence_search	argument
2026-05-11 05:24	evidence_search	argument
2026-05-11 04:42	evidence_search	argument
2026-05-11 04:18	evidence_search	argument
2026-05-11 03:48	evidence_search	argument
2026-05-11 03:18	evidence_search	argument
2026-05-11 02:48	evidence_search	argument
2026-05-11 01:00	evidence_search	argument
2026-05-11 00:12	evidence_search	argument
2026-05-11 00:06	evidence_search	argument
2026-05-11 00:00	audience_simulation	argument
2026-05-10 23:54	red_team_kill	argument
2026-05-10 23:48	steelman	argument
2026-05-10 23:44	genesis	argument