Wire v2 axes A1–A10 into hypothesis_engine/moves/filter_score.js (shadow-first)

council rejected ARCHITECTURE reversible: medium 12h proposed 19 May 2026

What is the proposed change?

filter_score.js currently scores 5 v1 axes (data_moat, ten_x_model_test, fast_feedback_loops, solo_founder_feasible, ai_providers_cant_eat_it) against the highSystem/lowSystem adversarial pair. The v2_a1..v2_a10 DB columns already exist from the s112 migration but receive no writes. Phase 1 (shadow): add a second scoring pass inside each of the 3 filter_score runs that evaluates all 10 v2 axes (A1–A10 per brain/V2_FILTER_DESIGN_v2.3.md) via the same GPT/Gemini adversarial mechanism, writing per-axis medians to v2_a1..v2_a10. Keep the ABSOLUTE_FLOOR=9 graduation gate on the existing v1 composite formula — shadow scores are logged only. Phase 2 (after 2 shadow cycles, ~14 days): switch the graduation gate in scheduler.js from the v1 composite formula to the v2 composite (median_total_v2 - 0.5*IQR_v2 + 0.3*signal_count) with ABSOLUTE_FLOOR_V2=18 (proportional: 60% of max 30). Retire v1 axis scoring from the highSystem/lowSystem prompts. The proposer-side v2_evidence fields (pain_evidence, reachable_channel, build_complexity_estimate, workflow_cadence per V2_FILTER_DESIGN_v2.3.md §proposer coupling) need to be added to genesis.js PROPOSER_SYSTEM concurrently.

Target files

hypothesis_engine/moves/filter_score.js hypothesis_engine/scheduler.js

Expected effect

The 4 Commander-killed hypotheses (a38d31, c89a71, 6bf9c5, e9cb5c) should score ≤18/30 on v2 composite due to failing scalable_revenue (audit-shaped), distribution_reachability (no warm-contact base noted in override reason), and commander_non_engine_work_fit. The v1 composite scored these ≥9/15 (engine passed; Commander overrode). Spread between ROBUST candidates (ec4507, 3656a0) and Commander-overridden candidates increases: v1 spread ≤4 points, v2 spread ≥10 points.

Falsifier — what would prove this wrong?

After Phase 1 shadow period: Codex back-scores the 4 Commander-killed hypotheses using v2 axis prompts against their stored descriptions. If ≥3 of 4 score <18/30 on v2 while scoring ≥9/15 on v1, the axis redesign discriminates correctly. Invalidating observation: all 4 Commander-killed hypotheses also score ≥18/30 on v2 (axes are being gamed by existing proposal text, not improving signal).

Evidence that triggered the proposal

Corpus D: brain/META_ENGINE_S158_RED_TEAM_BRIEF.md — explicit: 'filter_score.js currently has 5 v1 axes (legacy, in production) and there are 10 v2 columns (v2_a1..v2_a10) created by the s112 migration but not yet wired'
Corpus E: Commander overrides (4 hypotheses) — a38d31 KILL, c89a71 KILL, 6bf9c5 KILL, e9cb5c DEFER; all passed v1 filter but violated scalable_revenue / distribution_reachability criteria that exist in v2 but not v1
Corpus D: brain/V2_FILTER_DESIGN_v2.3.md — 10-axis spec ratified after 3 adversarial red-team rounds totaling ~$2.00 in review; axes designed specifically to catch the failure modes in Commander overrides; implementation deferred S112 pending ratification that has since occurred

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

Axis	Score
specificity	3
falsifier	2
solo feasible	2
blast radius	1
composability	2
reversibility	1

Disposition

Rejected at the council verdict. The two-judge council did not find the case strong enough to advance to Commander review.

Evaluation history

When	Move
2026-05-23 04:34	meta_council_verdict
2026-05-23 04:20	meta_argument
2026-05-19 13:00	red_team_kill
2026-05-19 11:18	steelman
2026-05-19 10:07	meta_filter_score
2026-05-19 10:04	meta_genesis