Bot-Promise Slip Triage for B2B Support Operations

graduated [TRIANGULATED] filter 9.0/15 spread ±1.0 signals: 2 independent

What is this?

A daily morning ledger that surfaces every customer-facing commitment a bot agent (Intercom Fin / Zendesk AI / Decagon / Ada) made in the last 24-72 hours that is now at risk of breach, before the customer escalates. Buyer is the support ops lead at a 50-300 person B2B SaaS running one of these bot agents on inbound tickets. The bot constantly promises resolution dates, refunds, escalations, or engineering involvement it cannot guarantee, and the human team only finds out when the customer comes back angry. The product consumes the bot platform's structured event log (deadline_promised, refund_offered, escalated_to_human — manually tagged once at setup as commitment categories), runs AE's adversarial multi-model debate to challenge each event against current ticket state and historical similar-ticket resolution patterns, then ranks tickets by breach probability. The lead works the top 10 each morning, salvaging the customer before NPS damage. Outcome ground truth resolves within 3-14 days as tickets close, closing AE's grading loop on real reality — not on a vendor's self-rating of its own bot.

Why did we consider it?

Bot agents make promises their humans cannot keep; AE's adversarial-debate + reality-graded grading is uniquely shaped to rank breach risk before NPS damage, and the buyer, integration, and price all fit a solo UK evenings-and-weekends operator.

What breaks?

The Band-Aid Fallacy: Buyers will disable or restrict rogue bots making false financial/timeline promises rather than paying for a third-party triage tool to monitor them.
API Reality & Rate Limits: Bot platforms don't emit structured logs for hallucinated promises; parsing raw transcripts at scale will crush a solo dev with rate limits (per dblock's 'AI Slop' warning).
The HITL Bottleneck: Forcing Support Ops to manually salvage bot mistakes daily creates an unscalable human bottleneck (per Tian Pan's 'Human Review Queue' analysis).

What did we learn?

Engine verdict: GATHER_MORE_SIGNAL (WORTH_SKIMMING). Real structural pain, but extraction premise unvalidated and GTM fit hostile to introvert solo founder — needs 7-day signal check before commit.

Filter scores

Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.

Axis	What it measures
data moat	Does this product accumulate proprietary data that compounds?
10x model test	Does a better model make this more valuable, or redundant?
fast feedback loops	Can outputs be graded against reality in <30 days?
solo founder feasible	Can a solo operator build and run this without a team?
AI providers cant eat it	Do hyperscalers have structural reasons NOT to build this?

Composite median: 9.0 / 15. Graduation threshold: 9.0. IQR across runs: 1.0.

Evidence

Signal B — Competitor with documented gap

https://www.usefini.com/blog/ai-ticket-triage-automation

Fini and similar AI triage tools (Wizr, LiveChatAI) focus on routing and resolving inbound tickets but do not retroactively audit commitments the bot itself made, detect at-risk promises, or rank tickets by breach probability before customer escalation. The gap is post-promise monitoring: no existing tool treats the bot's own outputs as liabilities to be triaged.

Signal D — Demand proxy

{"found":true,"summary":"Multiple content signals confirm the problem space: chatbot mistakes in customer support (broken escalations, poor handoffs) are widely discussed, B2B support teams struggle with reactive ticket-chasing, and the gap between bot automation and operational control is a recognized theme. However, no forum threads or GitHub issues specifically discuss bot-promise breach detection.","sources":["https://www.nurix.ai/resources/chatbot-mistakes-customer-support","https://front.com/blog/customer-service-automation","https://front.com/blog/b2b-customer-service","https://livechat…

Evaluation history

When	Stage	Phase
2026-05-13 03:37	deep_council_verdict	graduated
2026-05-13 03:36	deep_claude_take	graduated
2026-05-13 03:34	deep_90day_plan	graduated
2026-05-13 03:33	deep_risk	graduated
2026-05-13 03:32	deep_distribution	graduated
2026-05-13 03:30	deep_pricing	graduated
2026-05-13 03:29	deep_moat	graduated
2026-05-13 03:28	deep_buyer_sim	graduated
2026-05-13 03:27	deep_icp	graduated
2026-05-13 03:25	deep_competitor	graduated
2026-05-13 03:25	deep_market_reality	graduated
2026-05-13 03:18	filter_score	scored
2026-05-13 03:12	filter_score	scored
2026-05-13 03:06	filter_score	scored
2026-05-13 03:00	evidence_search	argument
2026-05-13 00:24	evidence_search	argument
2026-05-12 22:36	evidence_search	argument
2026-05-12 20:48	evidence_search	argument
2026-05-12 18:54	evidence_search	argument
2026-05-12 17:00	evidence_search	argument
2026-05-12 15:12	evidence_search	argument
2026-05-12 13:24	evidence_search	argument
2026-05-12 11:36	evidence_search	argument
2026-05-12 09:48	evidence_search	argument
2026-05-12 08:06	evidence_search	argument
2026-05-12 06:18	evidence_search	argument
2026-05-11 20:30	evidence_search	argument
2026-05-11 18:48	evidence_search	argument
2026-05-11 17:18	evidence_search	argument
2026-05-11 15:48	evidence_search	argument
2026-05-11 14:24	evidence_search	argument
2026-05-11 12:54	evidence_search	argument
2026-05-11 12:24	evidence_search	argument
2026-05-11 11:54	evidence_search	argument
2026-05-11 11:30	evidence_search	argument
2026-05-11 11:18	evidence_search	argument
2026-05-11 11:06	evidence_search	argument
2026-05-11 10:54	evidence_search	argument
2026-05-11 10:42	evidence_search	argument
2026-05-11 10:36	evidence_search	argument
2026-05-11 10:24	evidence_search	argument
2026-05-11 10:18	audience_simulation	argument
2026-05-11 10:12	red_team_kill	argument
2026-05-11 10:06	steelman	argument
2026-05-11 10:02	genesis	argument