← all hypothesesRecruiter Claim-to-Scorecard Ledger for In-House Talent Heads
graduated [TRIANGULATED] filter 10.5/15 spread ±1.0 signals: 2 independent
What is this?
Pre-send gate plus per-agency claim-accuracy ledger for in-house heads of talent at 50-200 person founder-led SaaS with 2-5 external agencies. During onboarding, AE ingests the head's existing Greenhouse/Ashby/Lever scorecard rubric (typically 4-6 attributes like Technical Depth, Leadership, Ownership, Communication, each scored 1-4 by interviewers). When a recruiter submits a pitch, AE structures it into claim primitives aligned to those exact attributes, runs adversarial multi-model debate to surface the weakest claim, and produces three sharp pre-interview interrogation questions. After the loop, the head reads the per-attribute interviewer scores already in the ATS scorecard and pastes 4-6 numeric values (under 60s) — no narrative remapping. Over 8-12 weeks, each agency accrues a per-attribute calibration ledger ('Agency X reliable on tenure, inflates Leadership by 2.1 points vs interviewer average'). Renewal-negotiation lever. £200-500/mo solo self-buy.
Why did we consider it?
AE's prediction-grading and adversarial-debate engine becomes a per-agency accuracy ledger that in-house talent heads buy as a renewal-negotiation lever, fitting a 2026 budget line, a sub-60s paste workflow, and a solo £100-300K-ARR path.
What breaks?
- Insufficient data volume: 50-200 person startups do not process enough agency candidates to generate statistically significant calibration ledgers.
- False market leverage: Startups cannot use scorecards to negotiate down contingency fees for hard-to-fill roles; agencies will simply route candidates to competitors.
- Workflow friction: Manual ATS-to-AE data pasting will be abandoned by overworked TA heads, aligning with Gartner's reported 47% AI HR tool failure rate.
What did we learn?
Engine verdict: ESCALATED (MUST_READ). Council could not converge after 3 rounds — human decision required
Filter scores
Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.
| Axis | What it measures |
|---|
| data moat | Does this product accumulate proprietary data that compounds? |
| 10x model test | Does a better model make this more valuable, or redundant? |
| fast feedback loops | Can outputs be graded against reality in <30 days? |
| solo founder feasible | Can a solo operator build and run this without a team? |
| AI providers cant eat it | Do hyperscalers have structural reasons NOT to build this? |
Composite median: 10.5 / 15. Graduation threshold: 9.0. IQR across runs: 1.0.
Evidence
Signal B — Competitor with documented gap
Greenhouse offers structured hiring with scorecards (kickoff alignment, sourcing, scorecards) but has no pre-send claim gate for agency pitches, no adversarial claim decomposition against scorecard attributes, and no per-agency calibration ledger tracking claim accuracy over time. The scorecard lives inside the ATS as an interviewer evaluation tool, not as an agency accountability mechanism.
Signal D — Demand proxy
{"found":true,"summary":"Weak demand signals: Reddit r/recruitinghell thread surfaces systemic frustration with opaque recruiting practices and lack of accountability mechanisms; LinkedIn posts discuss AI disruption of talent strategy; multiple content pieces (ERE, HackerEarth, InsideArm) argue for recruiter performance scorecards and measurable accountability — indicating the talent-ops audience already thinks in scorecard terms but lacks agency-specific calibration tooling.","sources":["https://www.reddit.com/r/recruitinghell/comments/1n8dv4g/job_hunters_heres_whats_really_happening_right/",…
Evaluation history
| When | Stage | Phase |
|---|
| 2026-05-13 07:26 | deep_council_verdict | graduated |
| 2026-05-13 07:24 | deep_claude_take | graduated |
| 2026-05-13 07:22 | deep_90day_plan | graduated |
| 2026-05-13 07:21 | deep_risk | graduated |
| 2026-05-13 07:19 | deep_distribution | graduated |
| 2026-05-13 07:18 | deep_pricing | graduated |
| 2026-05-13 07:17 | deep_moat | graduated |
| 2026-05-13 07:16 | deep_buyer_sim | graduated |
| 2026-05-13 07:15 | deep_icp | graduated |
| 2026-05-13 07:14 | deep_competitor | graduated |
| 2026-05-13 07:13 | deep_market_reality | graduated |
| 2026-05-13 07:06 | filter_score | scored |
| 2026-05-13 07:00 | filter_score | scored |
| 2026-05-13 06:54 | filter_score | scored |
| 2026-05-13 06:49 | evidence_search | argument |
| 2026-05-13 01:24 | evidence_search | argument |
| 2026-05-12 23:30 | evidence_search | argument |
| 2026-05-12 21:42 | evidence_search | argument |
| 2026-05-12 19:54 | evidence_search | argument |
| 2026-05-12 17:54 | evidence_search | argument |
| 2026-05-12 16:12 | evidence_search | argument |
| 2026-05-12 14:18 | evidence_search | argument |
| 2026-05-12 12:36 | evidence_search | argument |
| 2026-05-12 10:42 | evidence_search | argument |
| 2026-05-12 09:00 | evidence_search | argument |
| 2026-05-12 07:12 | evidence_search | argument |
| 2026-05-12 05:24 | evidence_search | argument |
| 2026-05-11 19:18 | evidence_search | argument |
| 2026-05-11 17:54 | evidence_search | argument |
| 2026-05-11 16:24 | evidence_search | argument |
| 2026-05-11 15:00 | evidence_search | argument |
| 2026-05-11 13:24 | evidence_search | argument |
| 2026-05-11 06:30 | evidence_search | argument |
| 2026-05-11 05:18 | evidence_search | argument |
| 2026-05-10 23:18 | evidence_search | argument |
| 2026-05-10 23:06 | evidence_search | argument |
| 2026-05-10 23:00 | evidence_search | argument |
| 2026-05-10 22:54 | evidence_search | argument |
| 2026-05-10 22:48 | evidence_search | argument |
| 2026-05-10 22:24 | evidence_search | argument |
| 2026-05-10 22:18 | evidence_search | argument |
| 2026-05-10 22:12 | evidence_search | argument |
| 2026-05-10 22:06 | audience_simulation | argument |
| 2026-05-10 21:54 | red_team_kill | argument |
| 2026-05-10 21:48 | steelman | argument |
| 2026-05-10 21:44 | genesis | argument |