Meta engine — engine improving itself

A third lane that does not propose products. It proposes changes to the engine that proposes products. Each candidate is a concrete, falsifiable, solo-feasible modification — a new filter axis, a corpus tweak, a prompt revision, a tool, a harness. Same evaluation discipline as the other lanes, applied inward.

Accepted 4

Most recent 23 May 2026

Add 'evidence' to meta_engine genesis validateProposal required fields and structure-check it

23 May 2026proposed

approved TOOL reversible: simple 1h

Proposals with no evidence array, empty evidence, or evidence items missing source_corpus/source will be rejected at validation time with a specific reason rather than silently persisting with evidence=[]. Retrocheck: `SELECT id, title FROM hypotheses WHERE lane='meta' AND json_e…

Fix meta_engine genesis validator: cap solo_time_estimate at 16h not 24h

23 May 2026proposed

approved TOOL reversible: simple 1h

Proposals with solo_time_estimate in the 17-24h range are now caught by validateProposal() and routed to rejected[] instead of persisting to the DB. The enforcement gap between the system prompt contract (16h) and the validator implementation (24h) is closed. Historical retrochec…

Add NBJ 5-question describability pre-check at start of argument.js

18 May 2026proposed

accepted with revision shadow mode GATE reversible: simple 6h

Applied to 43 S157 graduated candidates: hyp-2026-05-06-847f7e (0/5 on S157 manual review) is killed before argument; none of the 25 ROBUST candidates (4-5/5) are killed. Per the move cost rollup, argument + council_verdict + 7 deep moves average approximately $0.12-0.18 per hypo…

Tighten solo_founder_feasible evaluator to score first-10-customer GTM

18 May 2026proposed

accepted with revision PROMPT reversible: simple 2h

Back-scoring hyp-2026-05-14-d3786b (Agronomy Advisory for UK soft-fruit and glasshouse growers — institutional trade-channel buyers) and hyp-2026-05-11-cc72cd (Bot-Promise Slip for B2B Support Ops — enterprise procurement buyers) with revised prompt produces solo_founder_feasible…

Deferred / rejected 47

Most recent 31 May 2026

Add cross-vendor judge enforcement gate at council_verdict

31 May 2026proposed

filter rejected GATE reversible: simple 3h

Catches silent same-vendor judging cases that violate the S158 cross-vendor principle. Over 30 cycles, expect 0-3 enforcement-triggered rejudges (low because doctrine is mostly followed); rate >10% means routing config has drifted and is the real fix.

Reweight synthetic.jsonl corpus by per-item graduation lift

31 May 2026proposed

filter rejected CORPUS reversible: simple 7h

Filter-kept rate of genesis output rises 10-25% within 30 cycles, because dead synthetic items stop being retrieved at uniform rate. Top-quartile synthetic items get retrieved 2-3x more often.

Add empty-noun-phrase detector tool gating pre-filter

31 May 2026proposed

council rejected TOOL reversible: simple 6h

Cuts filter-kept rate of 847f7e-shape proposals by ≥50% relative to current baseline, without adding LLM cost. Over 30 cycles, ≥80% of detector-flagged items are also human-rated as empty.

Add genesis backend-fallback harness for context overflow

31 May 2026proposed

filter rejected HARNESS reversible: simple 5h

Eliminates the silent 5-cycle genesis outage class observed in S180. Over 30 cycles, ≥98% of genesis attempts produce output. Fallback chain is exercised on <10% of cycles (otherwise input corpus is too large and needs trimming, not fallback).

Add falsifier-coherence gate between argument and council_verdict

31 May 2026proposed

filter rejected GATE reversible: simple 6h

Catches drift cases where argument constructs a sharp falsifier and council silently substitutes a softer one. Over 30 cycles, expect 5-15% of currently-approved proposals to be rejected for falsifier_drift, surfacing a class of error that pending_commander review currently absor…

Add v2_a11 describability axis scored via S157 NBJ 5-question rubric

31 May 2026proposed

filter rejected AXIS reversible: simple 4h

On the 43 historically-graduated candidates, 847f7e-shape proposals (empty noun phrases) score 0-1 on a11 while ROBUST candidates score 2-3, producing a 2-3 point spread on the composite. Filter-kept rate of 847f7e-shape drops ≥30%.

Add Corpus T signal-age decay axis v2_a12 for stale search items

30 May 2026proposed

filter rejected AXIS reversible: simple 4h

Proposals grounded only in stale Corpus T (e.g. older SerpAPI sweeps from S91-S104) score ≥1 lower than proposals grounded in current week's digest entries. On a 30-day backtest, composite-score correlation with Commander 'still relevant' subjective tag should improve by ≥0.15 Pe…

Add genesis JSON output validator harness with single retry

30 May 2026proposed

council rejected HARNESS reversible: simple 6h

Genesis-induced cycle failures (silent outage class — see fix in commit 0f2d20d for Bedrock Opus 4.6) drop to zero over 60 days. Downstream filter_score.js never receives a malformed proposal in production. Retry rate sits between 5-15% (signal that the validator is firing) but f…

Add cross-judge fatal-objection consensus short-circuit gate

30 May 2026proposed

filter rejected GATE reversible: simple 4h

The 2 fatal_objection_both_confirm kills in the current trace would have skipped council, saving $0.20 across those cycles. Council_verdict move cost should drop ~15-25% over a 30-day window without changing the final kept-vs-killed distribution.

Add v2_a11 commander-kill-likelihood axis from override history

30 May 2026proposed

filter rejected AXIS reversible: simple 8h

Across the next 20 cycles, candidates that the engine would have produced and council-passed but Commander would have killed should score ≤1 on v2_a11 at least 60% of the time. Composite score spread between Commander-KILL-shape candidates and Commander-pass-shape candidates ≥ 1.…

Add structural-duplicate gate before meta_argument move

30 May 2026proposed

filter rejected GATE reversible: simple 5h

On the 5-day Phase 1 trace (7× v2_backfill_orphan + 1× structural_duplicate_15ed71), at least 2-3 of those kills migrate from post-argument to pre-argument, saving ~$0.20-0.33 in argument cost per Phase 1 week. Argument move cost drops measurably in the next cycle telemetry.

Create hypothesis_engine/tools/orphan_scanner.js diagnostic script

23 May 2026proposed

council rejected TOOL reversible: simple 2h

Report reveals exact count of live orphaned rows (expected >0 given 7 kills in 7-day window with kill_reason='v2_backfill_orphan_S148'). Provides move_waste count per orphan to quantify GATE proposal value before implementation. If orphan_count=0, the GATE proposal (P1) is moot —…

Add composability axis to meta_engine filter_score judge prompt

23 May 2026proposed

filter rejected PROMPT reversible: simple 3h

When two proposals both target the same file and modify overlapping function blocks, the judge flags composability=major on at least one. Across 30 meta_filter_score runs (current observed volume), this surfaces implementation ordering conflicts before they reach Commander approv…

Add GW01 veto block to genesis.js for audit/verification cluster

23 May 2026proposed

filter rejected PROMPT reversible: simple 2h

Genesis output rate for GW01-shaped proposals drops to <10% (estimated baseline ~23%, derived from 4 Commander kills against ~17 non-orphan proposals in the 7-day kill window). This is the earliest intervention point — preventing generation rather than catching proposals post-hoc…

Add v2_a11 product_shape_gravity_resistance axis to filter_score.js

23 May 2026proposed

filter rejected AXIS reversible: simple 4h

The 4 Commander-killed hypotheses (a38d31 = AI Control Failure Forecast Audit, e9cb5c = Reality-Graded Upgrade Gate, c89a71 = ClaimGate for B2B SaaS, 6bf9c5 = AI Tool Claim Verification) all score 0 on v2_a11. Graduated candidates score 2-3. The axis adds a 2-3 point composite sp…

Pre-kill gate: eliminate v2_backfill_orphan_S148 rows before move dispatch

23 May 2026proposed

rejected GATE reversible: simple 4h

The 7 v2_backfill_orphan_S148 kills (58% of all kills in 7-day window) are pre-empted before any moves are dispatched. Each orphaned row currently burns evidence_search, red_team_kill, and steelman moves before the kill fires — at the observed move:kill ratio (~360 moves / 12 kil…

Auto-expire S112 PRE-SEND-GATE doctrine block in genesis.js by run-count check

22 May 2026proposed

filter rejected PROMPT reversible: simple 3h

With genesisRunCount ≈33 (33 days × ~1 run/day since 2026-04-19), S112 block is absent from PROPOSER_SYSTEM on next run. PROPOSER_SYSTEM character count drops by approximately the S112 block length (~900 chars, lines 88-106). Over the next 10 genesis runs without S112: if post-ho…

Add Commander override JSONL extractor as structured Corpus E feed for meta_engine genesis

22 May 2026proposed

filter rejected CORPUS reversible: simple 4h

The next meta genesis run after deploy will include the 3 recent Commander KILLs (ClaimGate, AI Tool Claim Verification, AI Control Failure) and 1 DEFER (Upgrade Gate) as explicit Corpus E items. At least 1 proposal in that run should cite 'commander_override' in its evidence fie…

Replace filter_score high-scorer callOpus47 → callCodexGpt55 for cross-vendor independence

23 May 2026proposed

council rejected PROMPT reversible: simple 2h

Genesis (Anthropic/Sonnet 4.6) and the optimist reviewer (now OpenAI/Codex) are different vendors. On re-scoring 10 recent hypotheses with both versions, mean high-low spread per axis should widen by ≥0.15 across at least 3 of 5 axes, indicating the reviewers are less correlated.…

Create hypothesis_engine/scripts/backtest_filter_axes.js for axis retrospective validation

21 May 2026proposed

filter rejected TOOL reversible: simple 5h

Running node hypothesis_engine/scripts/backtest_filter_axes.js --axis=v2_a11 --labels=s157_labels.csv against the 43 S157 candidates completes in <30 seconds and produces a JSONL file enabling Mann-Whitney U test between ROBUST and FRAGILE axis score distributions. This replaces …

Add exception-class prose instruction to genesis.js PROPOSER_SYSTEM (S157-EC rule block)

23 May 2026proposed

council rejected PROMPT reversible: simple 2h

Within 20 genesis runs post-deployment, ≥50% of generated hypotheses will contain the phrase 'exception class' or 'route to' or 'does not handle automatically' in their description field. Baseline rate is 2.3% (1/43 in S157 NBJ sweep). A grep over genesis output logs is sufficien…

Add commodity-wedge shadow check at argument.js entry point (Survivor E)

21 May 2026proposed

filter rejected GATE reversible: simple 6h

Flag rate in first 30 days should fall between 5-15% of argument runs. Flagged hypotheses should show lower graduation rates than unflagged over a 60-day observation window. If the commander-killed AI control / ClaimGate / AI Tool Verification for Agencies hypotheses are run thro…

Add non-convergence telemetry sink to council_verdict.js ESCALATED path (D.v2)

21 May 2026proposed

filter rejected GATE reversible: simple 3h

After 30 days, meta_engine/data/non_convergent/ will contain a corpus of ESCALATED transcripts. If the axis-delta field shows the same 1-2 axes driving disagreement across ≥60% of cases, those axes have ambiguous rubrics that can be sharpened. If zero files appear despite ESCALAT…

Add v2_a11 status_quo_displacement_ease axis to filter_score.js v2 rubric

21 May 2026proposed

filter rejected AXIS reversible: simple 4h

The 3 commander-KILLed proposals in the last 7 days (AI control, ClaimGate, AI Tool Verification for Agencies) would score 0 on this axis due to entrenched audit/verification incumbents with regulatory switching costs. Retrospective on 43 S157 NBJ candidates should show statistic…

Add commodity-wedge shadow gate logging in runFatalObjection

20 May 2026proposed

filter rejected GATE reversible: simple 6h

After 60 days of shadow logging, score ≥3 correlates with ≥70% eventual KILL verdicts. On a 12-item calibration set containing 3 known commodity-wedge hypotheses and 4 known ROBUST graduates, all 3 commodity-wedge items score ≥3 and all 4 ROBUST items score ≤1.

Add decisive_questions field to council_verdict Round 3 output schema

23 May 2026proposed

council rejected PROMPT reversible: simple 4h

Re-running escalated hypotheses 7199a9 and 2ca131 through the updated prompt each produces ≥2 distinct, non-overlapping testable questions naming specific observables (e.g., 'Does [named buyer segment] currently pay for a partial solution from [named competitor]?' rather than 'Is…

Wire v2_a7 distribution_reachability into filter_score.js

20 May 2026proposed

filter rejected AXIS reversible: simple 5h

d3786b-shaped hypotheses (cold outbound dependency, no owned channel) score v2_a7=0. ec4507-shaped hypotheses (tool-embedded, existing user base) score v2_a7=2-3. A 4-candidate test set spanning the distribution reachability spectrum produces a score range of ≥3 points on this ax…

Wire v2_a6 acute_pain_not_candy into filter_score.js

20 May 2026proposed

filter rejected AXIS reversible: simple 5h

Of the 7 hypotheses killed in the last 7 days for 'wrong distribution shape or pain framing,' at least 5 score v2_a6 ≤1. ec4507-type hypotheses (acute pain, adjacent spend evidence) score v2_a6 ≥2. The pre-council kill rate for candy-shaped hypotheses increases measurably, reduci…

Append full escalation transcript to non-convergent telemetry sink

20 May 2026proposed

filter rejected TOOL reversible: simple 3h

Every future council escalation produces a persisted JSONL record. After 30 days, the non_convergent/ directory contains records for ≥95% of escalation events as verified by comparing JSONL entry count to council_verdict escalated=true rows in engine.db.

Add reeval_in_days to GATHER_MORE_SIGNAL verdicts and deferred-reeval sweep in scheduler.js

19 May 2026proposed

filter rejected HARNESS reversible: medium 8h

5 of 9 recent council verdicts (55%) contain explicit temporal deferral conditions ('run Week 1 outbound before building', '7-day artifact-upload test', '7-day signal check'). These hypotheses currently sit in GATHER_MORE_SIGNAL indefinitely with no automatic re-evaluation. After…

Implement E: commodity-wedge shadow check before first argument move (five binary axes)

19 May 2026proposed

filter rejected GATE reversible: simple 6h

The 'RevOps Objection Taxonomy Normalizer' shape (GPT-5.5-Pro Round 1: passes describability, observed-buyer, solo-inbound, yet still structurally weak on urgency and data advantage) flags commodity_wedge=true on axes 3+4+5. After 4 weeks of shadow: hypotheses where commodity_wed…

Implement B.v2: NBJ 5-Q describability shadow gate before first argument move

23 May 2026proposed

council rejected GATE reversible: simple 8h

Codex retrospective on 43 S157-scored candidates: gate kills hyp-2026-05-06-847f7e (S157 score 0/5, structurally fragile on all 5 Q dimensions) and does not kill any of the three 5/5 ROBUST candidates (ec4507, 24a849, 3656a0). Spearman rank correlation between gate composite_scor…

Implement D.v2: non-convergence transcript sink in council_verdict.js

23 May 2026proposed

council rejected TOOL reversible: simple 4h

Current non-convergence rate: 22% (2 of 9 recent verdicts = 'council could not converge after 3 rounds'). At 9 verdicts/week, 30 days produces ~8–10 transcripts. The escalationReason field already distinguishes FACTUAL vs WEIGHTING vs FRAMING disagreements. The corpus enables the…

Wire v2 axes A1–A10 into hypothesis_engine/moves/filter_score.js (shadow-first)

23 May 2026proposed

council rejected ARCHITECTURE reversible: medium 12h

The 4 Commander-killed hypotheses (a38d31, c89a71, 6bf9c5, e9cb5c) should score ≤18/30 on v2 composite due to failing scalable_revenue (audit-shaped), distribution_reachability (no warm-contact base noted in override reason), and commander_non_engine_work_fit. The v1 composite sc…

Restructure council_verdict.js cheapest_instant_kill_test from string to machine-readable object

19 May 2026proposed

filter rejected PROMPT reversible: simple 4h

Every GATHER_MORE_SIGNAL verdict (currently ~55% of verdicts per S157 distribution: 25 ROBUST + 13 MIXED + 4 FRAGILE + 1 STRUCTURALLY FRAGILE out of 43) produces a machine-readable gate condition that Commander can execute without re-reading the full reasoning. The 2 non-converge…

Add timing_window structured field to genesis.js PROPOSER_SYSTEM output schema

23 May 2026proposed

council rejected PROMPT reversible: simple 3h

Genesis outputs for commodity or evergreen problems (e.g. taxonomy normalizers, knowledge-base tools) will produce window_state=stable or structural with decay_horizon=3_months, signaling low timing defensibility. Proposals tied to genuine substrate shifts (agent-era trust gaps, …

Require empirical failure analog in argument.js red_team_kill attacker output

23 May 2026proposed

council rejected PROMPT reversible: simple 2h

Argument transcripts will contain named companies in attacker rounds, enabling council to distinguish theoretical objections from documented failure patterns. The RevOps taxonomy shape (S158 Round 1 survivor shape that passes all describability/reachability checks) should generat…

Add validation_week_test structured field to genesis.js PROPOSER_SYSTEM output schema

23 May 2026proposed

council rejected PROMPT reversible: simple 3h

The 5 recent council verdicts (5d7cca, 26fc18, cc72cd, 90778c, c27754) each independently invented ad-hoc week-1 tests. Post-change, genesis outputs carry those tests, so council can evaluate their credibility rather than invent them. 10 consecutive genesis outputs should contain…

Replace v1 five-axis scoring in filter_score.js with ratified v2.3 ten-axis rubrics

19 May 2026proposed

filter rejected PROMPT reversible: medium 8h

Hypotheses like Commander-KILLED a38d31 (audit product, no warm-contact base) and c89a71 (ClaimGate, relational sales) should score A5=0 (scalable_revenue: pure audit service) and A7=0–1 (distribution_reachability: warm intros required), producing v2 composites below 40% even if …

Add shadow commodity-wedge gate to argument.js pre-debate (no hard kill)

23 May 2026proposed

council rejected GATE reversible: simple 6h

RevOps Objection Taxonomy Normalizer shape (CRM-integrated taxonomy, no urgency event, dashboard deliverable) flags commodity_wedge_recommendation=true. hyp-2026-05-06-ec4507 (Support Escalation: SLA deadline forcing function, Zendesk timestamp as external ground truth, not CRM-d…

Tighten fast_feedback_loops rubric to penalize multi-causal outcome attribution

23 May 2026proposed

council rejected PROMPT reversible: simple 2h

hyp-2026-05-06-847f7e (Support Promise Calibration Console — killed because 'CSAT/SLA outcomes are multi-causal') scores 0-1 on fast_feedback_loops under the revised rubric. hyp-2026-05-13-47730e (AI Portfolio Claim Auditor — killed because 'board verdicts multi-causal') scores 0…

Prompt genesis proposer to reason about exception classes inline

23 May 2026proposed

council rejected PROMPT reversible: simple 2h

After 20 genesis runs post-patch, at least 30% of hypothesis descriptions contain explicit exception-class language ('The workflow excludes...', 'Human review is required when...', or equivalent). Current S157 baseline: 1/43 graduated candidates (2.3%) explicitly named exception …

Add v2_a11 urgency_event_named shadow axis to filter_score.js

19 May 2026proposed

filter rejected AXIS reversible: simple 5h

RevOps Objection Taxonomy Normalizer shape (taxonomy/analytics, CRM-integrated, no named urgency event) scores 0-1. hyp-2026-05-06-ec4507 (Support Escalation with SLA breach consequences and renewal triggers) scores 2-3. Retrospective application to 43 S157-scored candidates show…

Log non-convergent council transcripts to data/non_convergent/ JSONL

19 May 2026proposed

filter rejected TOOL reversible: simple 3h

After 30 days of active council cycles (current rate ~9/week, empirical non-convergence rate 22%), directory accumulates 8-12 non-convergent transcripts. This corpus enables first structured analysis of split-reason taxonomy: whether splits cluster by hypothesis type, ICP, or mod…

Add observed-buyer pre-check at start of argument.js to skip no-market debates

18 May 2026proposed

deferred GATE reversible: simple 5h

Of 9 recent deep_council_verdict runs, at least 3 of the 5 killed hypotheses would be caught here: d3786b (Agronomy Advisory — no observable paying buyer for AI-powered agronomy ledgers), c27754 (Medical-Device SME buying AI components — buyer leverage unverified), cc72cd (Bot-Pr…

Add exception_classes field to genesis.js proposal output schema

18 May 2026proposed

deferred PROMPT reversible: simple 3h

After 2 genesis cycles (~62 new hypotheses at current 31/week throughput), at least 55% of generated proposals include a non-empty exception_classes field with ≥2 distinct named situations (not paraphrases). The companion exception_class_named axis (Proposal 1) will show mean sco…

Add exception_class_named as 6th v1 axis in filter_score.js

18 May 2026proposed

deferred AXIS reversible: simple 5h

Applied retroactively to 43 S157-graduated candidates: hyp-2026-04-19-3656a0 (ec4507, cited explicit scope exclusions) scores 2-3; hyp-2026-05-06-847f7e (zero exception classes anywhere) scores 0; at least 38 of 43 score 0-1, producing a minimum 2-point composite spread between R…