← all meta proposals

Replace filter_score high-scorer callOpus47 → callCodexGpt55 for cross-vendor independence

council rejected PROMPT reversible: simple 2h proposed 22 May 2026
What is the proposed change?
On line 118, replace `const gptCaller = useFast ? llm.callSonnet46 : llm.callOpus47;` with `const gptCaller = useFast ? llm.callSonnet46 : llm.callCodexGpt55;`. The highSystem prompt and userPrompt strings require no change — callCodexGpt55 accepts the same (system, user, opts) signature. Update the model label on line 130 from 'opus+gemini' to 'codex+gemini'. Add comment near line 118: `// cross-vendor: high=OpenAI/Codex, low=Google/Gemini, genesis=Anthropic/Sonnet`. Verify callCodexGpt55 returns {ok, text, cost_usd} in the same shape as callOpus47; if the cost field key differs (cost vs cost_usd), add a normalization line after the call.
Target files
hypothesis_engine/moves/filter_score.js
Expected effect
Genesis (Anthropic/Sonnet 4.6) and the optimist reviewer (now OpenAI/Codex) are different vendors. On re-scoring 10 recent hypotheses with both versions, mean high-low spread per axis should widen by ≥0.15 across at least 3 of 5 axes, indicating the reviewers are less correlated. Hypotheses where Sonnet genesis framing strongly anchors Opus (same-model family) to agree should show reduced agreement with Codex.
Falsifier — what would prove this wrong?
Re-score the same 10 hypotheses twice: once with the existing Opus+Gemini pairing, once with Codex+Gemini. Compute mean runSpreads per axis for each cohort. If mean spread does not widen by ≥0.10 on any axis across the two cohorts, vendor diversity is not producing scoring independence for this rubric — the axis prompts may be so tightly constrained that vendor choice is irrelevant, which would be a distinct finding worth logging.
Evidence that triggered the proposal
  • E — hypothesis_engine/moves/filter_score.js:118 — gptCaller resolves to callOpus47 (Anthropic) in standard mode; genesis.js uses Sonnet 4.6 (Anthropic); both caller and proposer are same vendor
  • D — ARCHITECT_MEMORY cross-vendor judging principle: meta_engine uses genesis=Anthropic/Sonnet, filter=OpenAI/Codex to prevent correlated errors — same principle not applied to hypothesis_engine
  • E — meta_engine/moves/filter_score.js — already uses callCodexGpt55 for KEEP/DROP, establishing that Codex handles structured-output filter tasks reliably in this codebase

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

AxisScore
specificity3
falsifier2
solo feasible3
blast radius3
composability3
reversibility3
Disposition
Rejected at the council verdict. The two-judge council did not find the case strong enough to advance to Commander review.

Evaluation history

WhenMove
2026-05-23 04:38meta_council_verdict
2026-05-23 04:25meta_argument
2026-05-22 04:15meta_filter_score
2026-05-22 04:14meta_genesis