Outside-Counsel Citation & Holding Pre-Reliance Check for In-House Legal

ranked [TRIANGULATED] filter 8.5/15 spread ±1.0 signals: 3 independent

What is this?

A pre-reliance gate where in-house General Counsel paste specific factual and legal claims from outside-counsel deliverables — cited cases, claimed holdings, statutory interpretations, regulatory conclusions — before relying on them for board advice, contract sign-off, or regulator filings. The product runs adversarial multi-model debate against each claim using public legal sources (CourtListener, Justia, govinfo, EUR-Lex, regulator archives), then categorises any failures using AE's six-pattern reasoning taxonomy: did outside counsel's AI cite a nonexistent case (Fatal Grounding Immunity), cite a real case whose holding doesn't support the proposition (Premise-Conclusion Severing), dress up dicta as binding precedent (Concession Laundering), or apply a superseded statute (Temporal & Transmission Blindness)? Buyers are 50-200 person founder-led or PE-backed firms whose 1-5 person in-house legal team uses 2-5 outside counsel firms regularly. AE is uniquely suited because the autopsy taxonomy was purpose-built for exactly these reasoning failure modes, and adversarial debate prevents the LLM-as-judge collapse any single-model verifier hits when grading legal text.

Why did we consider it?

AE's reasoning-failure taxonomy and adversarial grading rig are purpose-built for the exact verification gap GCs face when relying on outside counsel's AI-drafted legal claims.

What breaks?

Risk Transfer Paradox: GCs hire outside counsel to offload liability; verifying it in-house with an uninsurable tool forces the GC to re-assume malpractice risk.
Workflow Friction: Overworked 1-5 person in-house teams will not manually copy-paste citations from premium memos into a standalone verification engine.
Data Governance & Privilege: Pasting highly confidential legal strategy into a part-time solo developer's tool violates strict corporate security and attorney-client privilege norms.

What did we learn?

Still in evaluation (phase: ranked). No verdict yet.

Filter scores

Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.

Axis	What it measures
data moat	Does this product accumulate proprietary data that compounds?
10x model test	Does a better model make this more valuable, or redundant?
fast feedback loops	Can outputs be graded against reality in <30 days?
solo founder feasible	Can a solo operator build and run this without a team?
AI providers cant eat it	Do hyperscalers have structural reasons NOT to build this?

Composite median: 8.5 / 15. Graduation threshold: 9.0. IQR across runs: 1.0.

Evidence

Signal A — Primary source

https://arxiv.org/html/2412.14556v1 credibility: high

In this paper, we propose CitaLaw, the first benchmark designed to evaluate LLMs' ability to produce legally sound responses with appropriate citations.

Signal B — Competitor with documented gap

https://www.linkedin.com/posts/alineco_most-legal-ai-tools-get-opened-once-then-activity-7457763909442125825-UmIn

Aline validates that statute references point to the correct statute (single-model verification of citation existence), but the snippet shows no adversarial multi-model debate, no check of whether a cited holding actually supports the stated proposition, and no reasoning-failure taxonomy (hallucinated cases, dicta-as-precedent, superseded statutes). It addresses citation existence but not citation soundness.

Signal D — Demand proxy

{"found":true,"summary":"Active discussion in legal practitioner and tech communities about judges sanctioning attorneys for AI-generated bogus citations, and a 2026 S.D.N.Y. case raising novel questions about AI use in legal practice — both demonstrating acute awareness among legal professionals that AI-generated legal content requires pre-reliance verification.","sources":["https://www.reddit.com/r/paralegal/comments/1rwf846/ai_in_the_office/","https://news.ycombinator.com/item?id=47778920"],"reason":"Reddit r/paralegal thread describes judges sanctioning counsel for submitting bogus AI cita…

Evaluation history

When	Stage	Phase
2026-05-14 00:42	evidence_search	ranked
2026-05-13 23:42	evidence_search	ranked
2026-05-13 23:37	evidence_search	ranked
2026-05-13 23:24	evidence_search	ranked
2026-05-13 23:19	evidence_search	ranked
2026-05-13 23:12	evidence_search	ranked
2026-05-13 23:00	filter_score	scored
2026-05-13 22:54	filter_score	scored
2026-05-13 22:48	filter_score	scored
2026-05-13 22:43	evidence_search	argument
2026-05-13 22:36	audience_simulation	argument
2026-05-13 22:24	red_team_kill	argument
2026-05-13 22:18	steelman	argument
2026-05-13 22:14	genesis	argument