A self-grading hypothesis engine

Abstract Essence evaluates product opportunities the way it would evaluate any other claim: adversarial multi-model debate, structured filters, and a public verdict for every candidate. This dashboard is the live record.

What this is

What is it?

An autonomous engine that proposes, argues, evidences, and scores product hypotheses. Multiple frontier LLMs debate each candidate. A five-axis filter scores each one across three independent runs. Survivors graduate; failures are killed with documented reasoning.

Why this approach?

Anyone can generate convincing-looking product ideas. The harder question is which ones survive structured scrutiny. The engine answers that question on the record, with the reasoning visible.

What breaks?

The engine grades its own filter coverage. New failure modes (rubric blindspots, structural mismatches, distribution-shape mistakes) are surfaced via Commander overrides and patched into the next prompt revision. Every override is logged.

What we have learned so far

Buyer-side products consistently outperform seller-side ones when the seller monetises conviction. Structural fit (workflow shape, build complexity) matters more than scoring fit. Filter scores above the graduation bar are necessary but not sufficient — Commander review still kills a meaningful share of graduated candidates.

Featured candidate

Slot paused — no candidate committed

revisit: 2026-06-30

Commander has not yet reviewed any graduated candidate's full deep dossier. Featured-slot commitments require explicit commander_override_action recorded in engine.db, not Architect inference. A Commander-private dossier reading surface is being built (S157) so Commander can review candidates end-to-end before any pick is committed publicly.

Engine state

122

total hypotheses

graduated

in flight

killed / exhausted

3,474

moves logged

$270

engine spend lifetime

See all hypotheses →