Recruiter Claim-to-Scorecard Ledger for In-House Talent Heads

graduated [TRIANGULATED] filter 10.5/15 spread ±1.0 signals: 2 independent

What is this?

Pre-send gate plus per-agency claim-accuracy ledger for in-house heads of talent at 50-200 person founder-led SaaS with 2-5 external agencies. During onboarding, AE ingests the head's existing Greenhouse/Ashby/Lever scorecard rubric (typically 4-6 attributes like Technical Depth, Leadership, Ownership, Communication, each scored 1-4 by interviewers). When a recruiter submits a pitch, AE structures it into claim primitives aligned to those exact attributes, runs adversarial multi-model debate to surface the weakest claim, and produces three sharp pre-interview interrogation questions. After the loop, the head reads the per-attribute interviewer scores already in the ATS scorecard and pastes 4-6 numeric values (under 60s) — no narrative remapping. Over 8-12 weeks, each agency accrues a per-attribute calibration ledger ('Agency X reliable on tenure, inflates Leadership by 2.1 points vs interviewer average'). Renewal-negotiation lever. £200-500/mo solo self-buy.

Why did we consider it?

AE's prediction-grading and adversarial-debate engine becomes a per-agency accuracy ledger that in-house talent heads buy as a renewal-negotiation lever, fitting a 2026 budget line, a sub-60s paste workflow, and a solo £100-300K-ARR path.

What breaks?

Insufficient data volume: 50-200 person startups do not process enough agency candidates to generate statistically significant calibration ledgers.
False market leverage: Startups cannot use scorecards to negotiate down contingency fees for hard-to-fill roles; agencies will simply route candidates to competitors.
Workflow friction: Manual ATS-to-AE data pasting will be abandoned by overworked TA heads, aligning with Gartner's reported 47% AI HR tool failure rate.

What did we learn?

Engine verdict: ESCALATED (MUST_READ). Council could not converge after 3 rounds — human decision required

Filter scores

Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.

Axis	What it measures
data moat	Does this product accumulate proprietary data that compounds?
10x model test	Does a better model make this more valuable, or redundant?
fast feedback loops	Can outputs be graded against reality in <30 days?
solo founder feasible	Can a solo operator build and run this without a team?
AI providers cant eat it	Do hyperscalers have structural reasons NOT to build this?

Composite median: 10.5 / 15. Graduation threshold: 9.0. IQR across runs: 1.0.

Evidence

Signal B — Competitor with documented gap

https://www.greenhouse.com/blog/hiring-top-talent-playbook

Greenhouse offers structured hiring with scorecards (kickoff alignment, sourcing, scorecards) but has no pre-send claim gate for agency pitches, no adversarial claim decomposition against scorecard attributes, and no per-agency calibration ledger tracking claim accuracy over time. The scorecard lives inside the ATS as an interviewer evaluation tool, not as an agency accountability mechanism.

Signal D — Demand proxy

{"found":true,"summary":"Weak demand signals: Reddit r/recruitinghell thread surfaces systemic frustration with opaque recruiting practices and lack of accountability mechanisms; LinkedIn posts discuss AI disruption of talent strategy; multiple content pieces (ERE, HackerEarth, InsideArm) argue for recruiter performance scorecards and measurable accountability — indicating the talent-ops audience already thinks in scorecard terms but lacks agency-specific calibration tooling.","sources":["https://www.reddit.com/r/recruitinghell/comments/1n8dv4g/job_hunters_heres_whats_really_happening_right/",…

Evaluation history

When	Stage	Phase
2026-05-13 07:26	deep_council_verdict	graduated
2026-05-13 07:24	deep_claude_take	graduated
2026-05-13 07:22	deep_90day_plan	graduated
2026-05-13 07:21	deep_risk	graduated
2026-05-13 07:19	deep_distribution	graduated
2026-05-13 07:18	deep_pricing	graduated
2026-05-13 07:17	deep_moat	graduated
2026-05-13 07:16	deep_buyer_sim	graduated
2026-05-13 07:15	deep_icp	graduated
2026-05-13 07:14	deep_competitor	graduated
2026-05-13 07:13	deep_market_reality	graduated
2026-05-13 07:06	filter_score	scored
2026-05-13 07:00	filter_score	scored
2026-05-13 06:54	filter_score	scored
2026-05-13 06:49	evidence_search	argument
2026-05-13 01:24	evidence_search	argument
2026-05-12 23:30	evidence_search	argument
2026-05-12 21:42	evidence_search	argument
2026-05-12 19:54	evidence_search	argument
2026-05-12 17:54	evidence_search	argument
2026-05-12 16:12	evidence_search	argument
2026-05-12 14:18	evidence_search	argument
2026-05-12 12:36	evidence_search	argument
2026-05-12 10:42	evidence_search	argument
2026-05-12 09:00	evidence_search	argument
2026-05-12 07:12	evidence_search	argument
2026-05-12 05:24	evidence_search	argument
2026-05-11 19:18	evidence_search	argument
2026-05-11 17:54	evidence_search	argument
2026-05-11 16:24	evidence_search	argument
2026-05-11 15:00	evidence_search	argument
2026-05-11 13:24	evidence_search	argument
2026-05-11 06:30	evidence_search	argument
2026-05-11 05:18	evidence_search	argument
2026-05-10 23:18	evidence_search	argument
2026-05-10 23:06	evidence_search	argument
2026-05-10 23:00	evidence_search	argument
2026-05-10 22:54	evidence_search	argument
2026-05-10 22:48	evidence_search	argument
2026-05-10 22:24	evidence_search	argument
2026-05-10 22:18	evidence_search	argument
2026-05-10 22:12	evidence_search	argument
2026-05-10 22:06	audience_simulation	argument
2026-05-10 21:54	red_team_kill	argument
2026-05-10 21:48	steelman	argument
2026-05-10 21:44	genesis	argument