A self-grading hypothesis engine
Abstract Essence evaluates product opportunities the way it would evaluate any other claim: adversarial multi-model debate, structured filters, and a public verdict for every candidate. This dashboard is the live record.
What this is
What is it?
An autonomous engine that proposes, argues, evidences, and scores product hypotheses. Multiple frontier LLMs debate each candidate. A five-axis filter scores each one across three independent runs. Survivors graduate; failures are killed with documented reasoning.
Why this approach?
Anyone can generate convincing-looking product ideas. The harder question is which ones survive structured scrutiny. The engine answers that question on the record, with the reasoning visible.
What breaks?
The engine grades its own filter coverage. New failure modes (rubric blindspots, structural mismatches, distribution-shape mistakes) are surfaced via Commander overrides and patched into the next prompt revision. Every override is logged.
What we have learned so far
Buyer-side products consistently outperform seller-side ones when the seller monetises conviction. Structural fit (workflow shape, build complexity) matters more than scoring fit. Filter scores above the graduation bar are necessary but not sufficient — Commander review still kills a meaningful share of graduated candidates.
Featured candidate
Slot paused — no candidate committed
revisit: 2026-06-30
Commander has not yet reviewed any graduated candidate's full deep dossier. Featured-slot commitments require explicit commander_override_action recorded in engine.db, not Architect inference. A Commander-private dossier reading surface is being built (S157) so Commander can review candidates end-to-end before any pick is committed publicly.
Engine state
$270
engine spend lifetime
See all hypotheses →