← all hypothesesPre-Interview Probe Pack for In-House Recruiting Leads
ranked [TRIANGULATED] filter 8.5/15 spread ±0.5 signals: 3 independent
What is this?
A pre-interview tool used by heads of talent at 30-150 person founder-led SaaS who hire through external search firms or increasingly AI-polished sourcing services. When a recruiter sends through a 'this candidate is a 9/10 fit because X, Y, Z' rationale, the head pastes it in. AE's adversarial multi-model debate generates 5-8 behavioural probes engineered to falsify the rationale's strongest claims — not generic interview prompts. Probes drop into the interview-loop scorecard template. After interviews, the head selects scorecard verdicts; at 90 days, retention status. Over 3-6 hires per recruiter, the tool builds a per-recruiter rationale-vs-reality ledger: whose rationales survive probing, whose collapse. AE is uniquely suited because adversarial debate generates probes that try to break a claim rather than confirm it, and the code-enforced grading loop ties probe outcomes to ATS scorecard labels objectively (not LLM-as-judge). The ledger uses AE's lifecycle states to promote/demote/kill recruiters' rationale credibility over months.
Why did we consider it?
AE's adversarial debate plus objective lifecycle grading uniquely produces falsifying interview probes and a per-recruiter credibility ledger that prep-packet and ATS incumbents cannot replicate.
What breaks?
- Feedback Loop Mismatch: The AE's core strength is sub-24h grading, but the hypothesis relies on 90-day retention metrics and multi-week interview loops, neutralizing the engine's speed.
- Breaks Structured Interviewing: Generating bespoke adversarial probes per candidate destroys the standardized scorecard rubrics required for objective comparison and compliance.
- Statistically Insignificant Volume: A 30-150 person SaaS does not hire enough volume through individual external recruiters to build a meaningful 'credibility ledger' based on 90-day retention.
What did we learn?
Still in evaluation (phase: ranked). No verdict yet.
Filter scores
Five axes, each scored 0-3. Three independent runs by different model perspectives. Median shown.
| Axis | What it measures |
|---|
| data moat | Does this product accumulate proprietary data that compounds? |
| 10x model test | Does a better model make this more valuable, or redundant? |
| fast feedback loops | Can outputs be graded against reality in <30 days? |
| solo founder feasible | Can a solo operator build and run this without a team? |
| AI providers cant eat it | Do hyperscalers have structural reasons NOT to build this? |
Composite median: 8.5 / 15. Graduation threshold: 9.0. IQR across runs: 0.5.
Evidence
Signal A — Primary source
This study introduces a benchmarking methodology designed to evaluate the performance of AI-driven recruitment sourcing tools.
Signal B — Competitor with documented gap
GoPerfect provides generic pre-screening interview questions and hiring optimization strategies but does not generate adversarial behavioural probes tailored to falsify a specific recruiter rationale, nor does it build a per-recruiter rationale-vs-reality ledger tracking whose candidate claims survive structured probing over multiple hires.
Signal D — Demand proxy
{"found":true,"summary":"Active Reddit and HN discussion from in-house recruiters about lack of structured interview rubrics, AI-polished candidates undermining traditional screening, and practitioners building ad-hoc AI recruiting pipelines — all indicating unmet demand for tools that validate recruiter claims and adapt interview processes to AI-era sourcing.","sources":["https://www.reddit.com/r/recruiting/comments/16d47bm/question_for_inhouse_recruiters/","https://news.ycombinator.com/item?id=42909166","https://www.reddit.com/r/recruiting/comments/1rcfjs5/my_current_ai_recruiting_copilot_pi…
Evaluation history
| When | Stage | Phase |
|---|
| 2026-05-13 21:43 | evidence_search | ranked |
| 2026-05-13 16:24 | evidence_search | ranked |
| 2026-05-10 15:42 | evidence_search | ranked |
| 2026-05-10 15:00 | evidence_search | ranked |
| 2026-05-10 14:18 | evidence_search | ranked |
| 2026-05-10 13:36 | evidence_search | ranked |
| 2026-05-10 12:54 | evidence_search | ranked |
| 2026-05-10 12:12 | evidence_search | ranked |
| 2026-05-10 11:24 | evidence_search | ranked |
| 2026-05-10 09:43 | evidence_search | ranked |
| 2026-05-10 09:13 | evidence_search | ranked |
| 2026-05-10 08:55 | evidence_search | ranked |
| 2026-05-10 03:48 | filter_score | scored |
| 2026-05-10 03:42 | filter_score | scored |
| 2026-05-10 03:36 | filter_score | scored |
| 2026-05-10 03:30 | evidence_search | argument |
| 2026-05-10 03:24 | audience_simulation | argument |
| 2026-05-10 03:18 | red_team_kill | argument |
| 2026-05-10 03:12 | steelman | argument |
| 2026-05-10 03:08 | genesis | argument |