← all hypotheses

Recruiter Claim-to-Scorecard Ledger for In-House Talent Heads

graduated [TRIANGULATED] filter 10.5/15 spread ±1.0 signals: 2 independent
What is this?
Pre-send gate plus per-agency claim-accuracy ledger for in-house heads of talent at 50-200 person founder-led SaaS with 2-5 external agencies. During onboarding, AE ingests the head's existing Greenhouse/Ashby/Lever scorecard rubric (typically 4-6 attributes like Technical Depth, Leadership, Ownership, Communication, each scored 1-4 by interviewers). When a recruiter submits a pitch, AE structures it into claim primitives aligned to those exact attributes, runs adversarial multi-model debate to surface the weakest claim, and produces three sharp pre-interview interrogation questions. After the loop, the head reads the per-attribute interviewer scores already in the ATS scorecard and pastes 4-6 numeric values (under 60s) — no narrative remapping. Over 8-12 weeks, each agency accrues a per-attribute calibration ledger ('Agency X reliable on tenure, inflates Leadership by 2.1 points vs interviewer average'). Renewal-negotiation lever. £200-500/mo solo self-buy.
Why did we consider it?
AE's prediction-grading and adversarial-debate engine becomes a per-agency accuracy ledger that in-house talent heads buy as a renewal-negotiation lever, fitting a 2026 budget line, a sub-60s paste workflow, and a solo £100-300K-ARR path.
What breaks?
  • Insufficient data volume: 50-200 person startups do not process enough agency candidates to generate statistically significant calibration ledgers.
  • False market leverage: Startups cannot use scorecards to negotiate down contingency fees for hard-to-fill roles; agencies will simply route candidates to competitors.
  • Workflow friction: Manual ATS-to-AE data pasting will be abandoned by overworked TA heads, aligning with Gartner's reported 47% AI HR tool failure rate.
What did we learn?
Engine verdict: ESCALATED (MUST_READ). Council could not converge after 3 rounds — human decision required

Filter scores

Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.

AxisWhat it measures
data moatDoes this product accumulate proprietary data that compounds?
10x model testDoes a better model make this more valuable, or redundant?
fast feedback loopsCan outputs be graded against reality in <30 days?
solo founder feasibleCan a solo operator build and run this without a team?
AI providers cant eat itDo hyperscalers have structural reasons NOT to build this?
Composite median: 10.5 / 15. Graduation threshold: 9.0. IQR across runs: 1.0.

Evidence

Signal B — Competitor with documented gap

Greenhouse offers structured hiring with scorecards (kickoff alignment, sourcing, scorecards) but has no pre-send claim gate for agency pitches, no adversarial claim decomposition against scorecard attributes, and no per-agency calibration ledger tracking claim accuracy over time. The scorecard lives inside the ATS as an interviewer evaluation tool, not as an agency accountability mechanism.

Signal D — Demand proxy

{"found":true,"summary":"Weak demand signals: Reddit r/recruitinghell thread surfaces systemic frustration with opaque recruiting practices and lack of accountability mechanisms; LinkedIn posts discuss AI disruption of talent strategy; multiple content pieces (ERE, HackerEarth, InsideArm) argue for recruiter performance scorecards and measurable accountability — indicating the talent-ops audience already thinks in scorecard terms but lacks agency-specific calibration tooling.","sources":["https://www.reddit.com/r/recruitinghell/comments/1n8dv4g/job_hunters_heres_whats_really_happening_right/",…

Evaluation history

WhenStagePhase
2026-05-13 07:26deep_council_verdictgraduated
2026-05-13 07:24deep_claude_takegraduated
2026-05-13 07:22deep_90day_plangraduated
2026-05-13 07:21deep_riskgraduated
2026-05-13 07:19deep_distributiongraduated
2026-05-13 07:18deep_pricinggraduated
2026-05-13 07:17deep_moatgraduated
2026-05-13 07:16deep_buyer_simgraduated
2026-05-13 07:15deep_icpgraduated
2026-05-13 07:14deep_competitorgraduated
2026-05-13 07:13deep_market_realitygraduated
2026-05-13 07:06filter_scorescored
2026-05-13 07:00filter_scorescored
2026-05-13 06:54filter_scorescored
2026-05-13 06:49evidence_searchargument
2026-05-13 01:24evidence_searchargument
2026-05-12 23:30evidence_searchargument
2026-05-12 21:42evidence_searchargument
2026-05-12 19:54evidence_searchargument
2026-05-12 17:54evidence_searchargument
2026-05-12 16:12evidence_searchargument
2026-05-12 14:18evidence_searchargument
2026-05-12 12:36evidence_searchargument
2026-05-12 10:42evidence_searchargument
2026-05-12 09:00evidence_searchargument
2026-05-12 07:12evidence_searchargument
2026-05-12 05:24evidence_searchargument
2026-05-11 19:18evidence_searchargument
2026-05-11 17:54evidence_searchargument
2026-05-11 16:24evidence_searchargument
2026-05-11 15:00evidence_searchargument
2026-05-11 13:24evidence_searchargument
2026-05-11 06:30evidence_searchargument
2026-05-11 05:18evidence_searchargument
2026-05-10 23:18evidence_searchargument
2026-05-10 23:06evidence_searchargument
2026-05-10 23:00evidence_searchargument
2026-05-10 22:54evidence_searchargument
2026-05-10 22:48evidence_searchargument
2026-05-10 22:24evidence_searchargument
2026-05-10 22:18evidence_searchargument
2026-05-10 22:12evidence_searchargument
2026-05-10 22:06audience_simulationargument
2026-05-10 21:54red_team_killargument
2026-05-10 21:48steelmanargument
2026-05-10 21:44genesisargument