Replace filter_score high-scorer callOpus47 → callCodexGpt55 for cross-vendor independence

council rejected PROMPT reversible: simple 2h proposed 22 May 2026

What is the proposed change?

On line 118, replace `const gptCaller = useFast ? llm.callSonnet46 : llm.callOpus47;` with `const gptCaller = useFast ? llm.callSonnet46 : llm.callCodexGpt55;`. The highSystem prompt and userPrompt strings require no change — callCodexGpt55 accepts the same (system, user, opts) signature. Update the model label on line 130 from 'opus+gemini' to 'codex+gemini'. Add comment near line 118: `// cross-vendor: high=OpenAI/Codex, low=Google/Gemini, genesis=Anthropic/Sonnet`. Verify callCodexGpt55 returns {ok, text, cost_usd} in the same shape as callOpus47; if the cost field key differs (cost vs cost_usd), add a normalization line after the call.

Target files

hypothesis_engine/moves/filter_score.js

Expected effect

Genesis (Anthropic/Sonnet 4.6) and the optimist reviewer (now OpenAI/Codex) are different vendors. On re-scoring 10 recent hypotheses with both versions, mean high-low spread per axis should widen by ≥0.15 across at least 3 of 5 axes, indicating the reviewers are less correlated. Hypotheses where Sonnet genesis framing strongly anchors Opus (same-model family) to agree should show reduced agreement with Codex.

Falsifier — what would prove this wrong?

Re-score the same 10 hypotheses twice: once with the existing Opus+Gemini pairing, once with Codex+Gemini. Compute mean runSpreads per axis for each cohort. If mean spread does not widen by ≥0.10 on any axis across the two cohorts, vendor diversity is not producing scoring independence for this rubric — the axis prompts may be so tightly constrained that vendor choice is irrelevant, which would be a distinct finding worth logging.

Evidence that triggered the proposal

E — hypothesis_engine/moves/filter_score.js:118 — gptCaller resolves to callOpus47 (Anthropic) in standard mode; genesis.js uses Sonnet 4.6 (Anthropic); both caller and proposer are same vendor
D — ARCHITECT_MEMORY cross-vendor judging principle: meta_engine uses genesis=Anthropic/Sonnet, filter=OpenAI/Codex to prevent correlated errors — same principle not applied to hypothesis_engine
E — meta_engine/moves/filter_score.js — already uses callCodexGpt55 for KEEP/DROP, establishing that Codex handles structured-output filter tasks reliably in this codebase

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

Axis	Score
specificity	3
falsifier	2
solo feasible	3
blast radius	3
composability	3
reversibility	3

Disposition

Rejected at the council verdict. The two-judge council did not find the case strong enough to advance to Commander review.

Evaluation history

When	Move
2026-05-23 04:38	meta_council_verdict
2026-05-23 04:25	meta_argument
2026-05-22 04:15	meta_filter_score
2026-05-22 04:14	meta_genesis