← all hypotheses

Outside-Counsel Citation & Holding Pre-Reliance Check for In-House Legal

ranked [TRIANGULATED] filter 8.5/15 spread ±1.0 signals: 3 independent
What is this?
A pre-reliance gate where in-house General Counsel paste specific factual and legal claims from outside-counsel deliverables — cited cases, claimed holdings, statutory interpretations, regulatory conclusions — before relying on them for board advice, contract sign-off, or regulator filings. The product runs adversarial multi-model debate against each claim using public legal sources (CourtListener, Justia, govinfo, EUR-Lex, regulator archives), then categorises any failures using AE's six-pattern reasoning taxonomy: did outside counsel's AI cite a nonexistent case (Fatal Grounding Immunity), cite a real case whose holding doesn't support the proposition (Premise-Conclusion Severing), dress up dicta as binding precedent (Concession Laundering), or apply a superseded statute (Temporal & Transmission Blindness)? Buyers are 50-200 person founder-led or PE-backed firms whose 1-5 person in-house legal team uses 2-5 outside counsel firms regularly. AE is uniquely suited because the autopsy taxonomy was purpose-built for exactly these reasoning failure modes, and adversarial debate prevents the LLM-as-judge collapse any single-model verifier hits when grading legal text.
Why did we consider it?
AE's reasoning-failure taxonomy and adversarial grading rig are purpose-built for the exact verification gap GCs face when relying on outside counsel's AI-drafted legal claims.
What breaks?
  • Risk Transfer Paradox: GCs hire outside counsel to offload liability; verifying it in-house with an uninsurable tool forces the GC to re-assume malpractice risk.
  • Workflow Friction: Overworked 1-5 person in-house teams will not manually copy-paste citations from premium memos into a standalone verification engine.
  • Data Governance & Privilege: Pasting highly confidential legal strategy into a part-time solo developer's tool violates strict corporate security and attorney-client privilege norms.
What did we learn?
Still in evaluation (phase: ranked). No verdict yet.

Filter scores

Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.

AxisWhat it measures
data moatDoes this product accumulate proprietary data that compounds?
10x model testDoes a better model make this more valuable, or redundant?
fast feedback loopsCan outputs be graded against reality in <30 days?
solo founder feasibleCan a solo operator build and run this without a team?
AI providers cant eat itDo hyperscalers have structural reasons NOT to build this?
Composite median: 8.5 / 15. Graduation threshold: 9.0. IQR across runs: 1.0.

Evidence

Signal A — Primary source

In this paper, we propose CitaLaw, the first benchmark designed to evaluate LLMs' ability to produce legally sound responses with appropriate citations.

Signal B — Competitor with documented gap

Aline validates that statute references point to the correct statute (single-model verification of citation existence), but the snippet shows no adversarial multi-model debate, no check of whether a cited holding actually supports the stated proposition, and no reasoning-failure taxonomy (hallucinated cases, dicta-as-precedent, superseded statutes). It addresses citation existence but not citation soundness.

Signal D — Demand proxy

{"found":true,"summary":"Active discussion in legal practitioner and tech communities about judges sanctioning attorneys for AI-generated bogus citations, and a 2026 S.D.N.Y. case raising novel questions about AI use in legal practice — both demonstrating acute awareness among legal professionals that AI-generated legal content requires pre-reliance verification.","sources":["https://www.reddit.com/r/paralegal/comments/1rwf846/ai_in_the_office/","https://news.ycombinator.com/item?id=47778920"],"reason":"Reddit r/paralegal thread describes judges sanctioning counsel for submitting bogus AI cita…

Evaluation history

WhenStagePhase
2026-05-14 00:42evidence_searchranked
2026-05-13 23:42evidence_searchranked
2026-05-13 23:37evidence_searchranked
2026-05-13 23:24evidence_searchranked
2026-05-13 23:19evidence_searchranked
2026-05-13 23:12evidence_searchranked
2026-05-13 23:00filter_scorescored
2026-05-13 22:54filter_scorescored
2026-05-13 22:48filter_scorescored
2026-05-13 22:43evidence_searchargument
2026-05-13 22:36audience_simulationargument
2026-05-13 22:24red_team_killargument
2026-05-13 22:18steelmanargument
2026-05-13 22:14genesisargument