← all hypothesesOutside-Counsel Citation & Holding Pre-Reliance Check for In-House Legal
ranked [TRIANGULATED] filter 8.5/15 spread ±1.0 signals: 3 independent
What is this?
A pre-reliance gate where in-house General Counsel paste specific factual and legal claims from outside-counsel deliverables — cited cases, claimed holdings, statutory interpretations, regulatory conclusions — before relying on them for board advice, contract sign-off, or regulator filings. The product runs adversarial multi-model debate against each claim using public legal sources (CourtListener, Justia, govinfo, EUR-Lex, regulator archives), then categorises any failures using AE's six-pattern reasoning taxonomy: did outside counsel's AI cite a nonexistent case (Fatal Grounding Immunity), cite a real case whose holding doesn't support the proposition (Premise-Conclusion Severing), dress up dicta as binding precedent (Concession Laundering), or apply a superseded statute (Temporal & Transmission Blindness)? Buyers are 50-200 person founder-led or PE-backed firms whose 1-5 person in-house legal team uses 2-5 outside counsel firms regularly. AE is uniquely suited because the autopsy taxonomy was purpose-built for exactly these reasoning failure modes, and adversarial debate prevents the LLM-as-judge collapse any single-model verifier hits when grading legal text.
Why did we consider it?
AE's reasoning-failure taxonomy and adversarial grading rig are purpose-built for the exact verification gap GCs face when relying on outside counsel's AI-drafted legal claims.
What breaks?
- Risk Transfer Paradox: GCs hire outside counsel to offload liability; verifying it in-house with an uninsurable tool forces the GC to re-assume malpractice risk.
- Workflow Friction: Overworked 1-5 person in-house teams will not manually copy-paste citations from premium memos into a standalone verification engine.
- Data Governance & Privilege: Pasting highly confidential legal strategy into a part-time solo developer's tool violates strict corporate security and attorney-client privilege norms.
What did we learn?
Still in evaluation (phase: ranked). No verdict yet.
Filter scores
Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.
| Axis | What it measures |
|---|
| data moat | Does this product accumulate proprietary data that compounds? |
| 10x model test | Does a better model make this more valuable, or redundant? |
| fast feedback loops | Can outputs be graded against reality in <30 days? |
| solo founder feasible | Can a solo operator build and run this without a team? |
| AI providers cant eat it | Do hyperscalers have structural reasons NOT to build this? |
Composite median: 8.5 / 15. Graduation threshold: 9.0. IQR across runs: 1.0.
Evidence
Signal A — Primary source
In this paper, we propose CitaLaw, the first benchmark designed to evaluate LLMs' ability to produce legally sound responses with appropriate citations.
Signal B — Competitor with documented gap
Aline validates that statute references point to the correct statute (single-model verification of citation existence), but the snippet shows no adversarial multi-model debate, no check of whether a cited holding actually supports the stated proposition, and no reasoning-failure taxonomy (hallucinated cases, dicta-as-precedent, superseded statutes). It addresses citation existence but not citation soundness.
Signal D — Demand proxy
{"found":true,"summary":"Active discussion in legal practitioner and tech communities about judges sanctioning attorneys for AI-generated bogus citations, and a 2026 S.D.N.Y. case raising novel questions about AI use in legal practice — both demonstrating acute awareness among legal professionals that AI-generated legal content requires pre-reliance verification.","sources":["https://www.reddit.com/r/paralegal/comments/1rwf846/ai_in_the_office/","https://news.ycombinator.com/item?id=47778920"],"reason":"Reddit r/paralegal thread describes judges sanctioning counsel for submitting bogus AI cita…
Evaluation history
| When | Stage | Phase |
|---|
| 2026-05-14 00:42 | evidence_search | ranked |
| 2026-05-13 23:42 | evidence_search | ranked |
| 2026-05-13 23:37 | evidence_search | ranked |
| 2026-05-13 23:24 | evidence_search | ranked |
| 2026-05-13 23:19 | evidence_search | ranked |
| 2026-05-13 23:12 | evidence_search | ranked |
| 2026-05-13 23:00 | filter_score | scored |
| 2026-05-13 22:54 | filter_score | scored |
| 2026-05-13 22:48 | filter_score | scored |
| 2026-05-13 22:43 | evidence_search | argument |
| 2026-05-13 22:36 | audience_simulation | argument |
| 2026-05-13 22:24 | red_team_kill | argument |
| 2026-05-13 22:18 | steelman | argument |
| 2026-05-13 22:14 | genesis | argument |