← all hypothesesBot-Promise Slip Triage for B2B Support Operations
graduated [TRIANGULATED] filter 9.0/15 spread ±1.0 signals: 2 independent
What is this?
A daily morning ledger that surfaces every customer-facing commitment a bot agent (Intercom Fin / Zendesk AI / Decagon / Ada) made in the last 24-72 hours that is now at risk of breach, before the customer escalates. Buyer is the support ops lead at a 50-300 person B2B SaaS running one of these bot agents on inbound tickets. The bot constantly promises resolution dates, refunds, escalations, or engineering involvement it cannot guarantee, and the human team only finds out when the customer comes back angry. The product consumes the bot platform's structured event log (deadline_promised, refund_offered, escalated_to_human — manually tagged once at setup as commitment categories), runs AE's adversarial multi-model debate to challenge each event against current ticket state and historical similar-ticket resolution patterns, then ranks tickets by breach probability. The lead works the top 10 each morning, salvaging the customer before NPS damage. Outcome ground truth resolves within 3-14 days as tickets close, closing AE's grading loop on real reality — not on a vendor's self-rating of its own bot.
Why did we consider it?
Bot agents make promises their humans cannot keep; AE's adversarial-debate + reality-graded grading is uniquely shaped to rank breach risk before NPS damage, and the buyer, integration, and price all fit a solo UK evenings-and-weekends operator.
What breaks?
- The Band-Aid Fallacy: Buyers will disable or restrict rogue bots making false financial/timeline promises rather than paying for a third-party triage tool to monitor them.
- API Reality & Rate Limits: Bot platforms don't emit structured logs for hallucinated promises; parsing raw transcripts at scale will crush a solo dev with rate limits (per dblock's 'AI Slop' warning).
- The HITL Bottleneck: Forcing Support Ops to manually salvage bot mistakes daily creates an unscalable human bottleneck (per Tian Pan's 'Human Review Queue' analysis).
What did we learn?
Engine verdict: GATHER_MORE_SIGNAL (WORTH_SKIMMING). Real structural pain, but extraction premise unvalidated and GTM fit hostile to introvert solo founder — needs 7-day signal check before commit.
Filter scores
Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.
| Axis | What it measures |
|---|
| data moat | Does this product accumulate proprietary data that compounds? |
| 10x model test | Does a better model make this more valuable, or redundant? |
| fast feedback loops | Can outputs be graded against reality in <30 days? |
| solo founder feasible | Can a solo operator build and run this without a team? |
| AI providers cant eat it | Do hyperscalers have structural reasons NOT to build this? |
Composite median: 9.0 / 15. Graduation threshold: 9.0. IQR across runs: 1.0.
Evidence
Signal B — Competitor with documented gap
Fini and similar AI triage tools (Wizr, LiveChatAI) focus on routing and resolving inbound tickets but do not retroactively audit commitments the bot itself made, detect at-risk promises, or rank tickets by breach probability before customer escalation. The gap is post-promise monitoring: no existing tool treats the bot's own outputs as liabilities to be triaged.
Signal D — Demand proxy
{"found":true,"summary":"Multiple content signals confirm the problem space: chatbot mistakes in customer support (broken escalations, poor handoffs) are widely discussed, B2B support teams struggle with reactive ticket-chasing, and the gap between bot automation and operational control is a recognized theme. However, no forum threads or GitHub issues specifically discuss bot-promise breach detection.","sources":["https://www.nurix.ai/resources/chatbot-mistakes-customer-support","https://front.com/blog/customer-service-automation","https://front.com/blog/b2b-customer-service","https://livechat…
Evaluation history
| When | Stage | Phase |
|---|
| 2026-05-13 03:37 | deep_council_verdict | graduated |
| 2026-05-13 03:36 | deep_claude_take | graduated |
| 2026-05-13 03:34 | deep_90day_plan | graduated |
| 2026-05-13 03:33 | deep_risk | graduated |
| 2026-05-13 03:32 | deep_distribution | graduated |
| 2026-05-13 03:30 | deep_pricing | graduated |
| 2026-05-13 03:29 | deep_moat | graduated |
| 2026-05-13 03:28 | deep_buyer_sim | graduated |
| 2026-05-13 03:27 | deep_icp | graduated |
| 2026-05-13 03:25 | deep_competitor | graduated |
| 2026-05-13 03:25 | deep_market_reality | graduated |
| 2026-05-13 03:18 | filter_score | scored |
| 2026-05-13 03:12 | filter_score | scored |
| 2026-05-13 03:06 | filter_score | scored |
| 2026-05-13 03:00 | evidence_search | argument |
| 2026-05-13 00:24 | evidence_search | argument |
| 2026-05-12 22:36 | evidence_search | argument |
| 2026-05-12 20:48 | evidence_search | argument |
| 2026-05-12 18:54 | evidence_search | argument |
| 2026-05-12 17:00 | evidence_search | argument |
| 2026-05-12 15:12 | evidence_search | argument |
| 2026-05-12 13:24 | evidence_search | argument |
| 2026-05-12 11:36 | evidence_search | argument |
| 2026-05-12 09:48 | evidence_search | argument |
| 2026-05-12 08:06 | evidence_search | argument |
| 2026-05-12 06:18 | evidence_search | argument |
| 2026-05-11 20:30 | evidence_search | argument |
| 2026-05-11 18:48 | evidence_search | argument |
| 2026-05-11 17:18 | evidence_search | argument |
| 2026-05-11 15:48 | evidence_search | argument |
| 2026-05-11 14:24 | evidence_search | argument |
| 2026-05-11 12:54 | evidence_search | argument |
| 2026-05-11 12:24 | evidence_search | argument |
| 2026-05-11 11:54 | evidence_search | argument |
| 2026-05-11 11:30 | evidence_search | argument |
| 2026-05-11 11:18 | evidence_search | argument |
| 2026-05-11 11:06 | evidence_search | argument |
| 2026-05-11 10:54 | evidence_search | argument |
| 2026-05-11 10:42 | evidence_search | argument |
| 2026-05-11 10:36 | evidence_search | argument |
| 2026-05-11 10:24 | evidence_search | argument |
| 2026-05-11 10:18 | audience_simulation | argument |
| 2026-05-11 10:12 | red_team_kill | argument |
| 2026-05-11 10:06 | steelman | argument |
| 2026-05-11 10:02 | genesis | argument |