Meta engine — engine improving itself

A third lane that does not propose products. It proposes changes to the engine that proposes products. Each candidate is a concrete, falsifiable, solo-feasible modification — a new filter axis, a corpus tweak, a prompt revision, a tool, a harness. Same evaluation discipline as the other lanes, applied inward.

Accepted 4

Most recent 23 May 2026
approved TOOL reversible: simple 1h
Proposals with no evidence array, empty evidence, or evidence items missing source_corpus/source will be rejected at validation time with a specific reason rather than silently persisting with evidence=[]. Retrocheck: `SELECT id, title FROM hypotheses WHERE lane='meta' AND json_e…
approved TOOL reversible: simple 1h
Proposals with solo_time_estimate in the 17-24h range are now caught by validateProposal() and routed to rejected[] instead of persisting to the DB. The enforcement gap between the system prompt contract (16h) and the validator implementation (24h) is closed. Historical retrochec…
accepted with revision shadow mode GATE reversible: simple 6h
Applied to 43 S157 graduated candidates: hyp-2026-05-06-847f7e (0/5 on S157 manual review) is killed before argument; none of the 25 ROBUST candidates (4-5/5) are killed. Per the move cost rollup, argument + council_verdict + 7 deep moves average approximately $0.12-0.18 per hypo…
accepted with revision PROMPT reversible: simple 2h
Back-scoring hyp-2026-05-14-d3786b (Agronomy Advisory for UK soft-fruit and glasshouse growers — institutional trade-channel buyers) and hyp-2026-05-11-cc72cd (Bot-Promise Slip for B2B Support Ops — enterprise procurement buyers) with revised prompt produces solo_founder_feasible…

Deferred / rejected 47

Most recent 31 May 2026
filter rejected GATE reversible: simple 3h
Catches silent same-vendor judging cases that violate the S158 cross-vendor principle. Over 30 cycles, expect 0-3 enforcement-triggered rejudges (low because doctrine is mostly followed); rate >10% means routing config has drifted and is the real fix.
filter rejected CORPUS reversible: simple 7h
Filter-kept rate of genesis output rises 10-25% within 30 cycles, because dead synthetic items stop being retrieved at uniform rate. Top-quartile synthetic items get retrieved 2-3x more often.
council rejected TOOL reversible: simple 6h
Cuts filter-kept rate of 847f7e-shape proposals by ≥50% relative to current baseline, without adding LLM cost. Over 30 cycles, ≥80% of detector-flagged items are also human-rated as empty.
filter rejected HARNESS reversible: simple 5h
Eliminates the silent 5-cycle genesis outage class observed in S180. Over 30 cycles, ≥98% of genesis attempts produce output. Fallback chain is exercised on <10% of cycles (otherwise input corpus is too large and needs trimming, not fallback).
filter rejected GATE reversible: simple 6h
Catches drift cases where argument constructs a sharp falsifier and council silently substitutes a softer one. Over 30 cycles, expect 5-15% of currently-approved proposals to be rejected for falsifier_drift, surfacing a class of error that pending_commander review currently absor…
filter rejected AXIS reversible: simple 4h
On the 43 historically-graduated candidates, 847f7e-shape proposals (empty noun phrases) score 0-1 on a11 while ROBUST candidates score 2-3, producing a 2-3 point spread on the composite. Filter-kept rate of 847f7e-shape drops ≥30%.
filter rejected AXIS reversible: simple 4h
Proposals grounded only in stale Corpus T (e.g. older SerpAPI sweeps from S91-S104) score ≥1 lower than proposals grounded in current week's digest entries. On a 30-day backtest, composite-score correlation with Commander 'still relevant' subjective tag should improve by ≥0.15 Pe…
council rejected HARNESS reversible: simple 6h
Genesis-induced cycle failures (silent outage class — see fix in commit 0f2d20d for Bedrock Opus 4.6) drop to zero over 60 days. Downstream filter_score.js never receives a malformed proposal in production. Retry rate sits between 5-15% (signal that the validator is firing) but f…
filter rejected GATE reversible: simple 4h
The 2 fatal_objection_both_confirm kills in the current trace would have skipped council, saving $0.20 across those cycles. Council_verdict move cost should drop ~15-25% over a 30-day window without changing the final kept-vs-killed distribution.
filter rejected AXIS reversible: simple 8h
Across the next 20 cycles, candidates that the engine would have produced and council-passed but Commander would have killed should score ≤1 on v2_a11 at least 60% of the time. Composite score spread between Commander-KILL-shape candidates and Commander-pass-shape candidates ≥ 1.…
filter rejected GATE reversible: simple 5h
On the 5-day Phase 1 trace (7× v2_backfill_orphan + 1× structural_duplicate_15ed71), at least 2-3 of those kills migrate from post-argument to pre-argument, saving ~$0.20-0.33 in argument cost per Phase 1 week. Argument move cost drops measurably in the next cycle telemetry.
council rejected TOOL reversible: simple 2h
Report reveals exact count of live orphaned rows (expected >0 given 7 kills in 7-day window with kill_reason='v2_backfill_orphan_S148'). Provides move_waste count per orphan to quantify GATE proposal value before implementation. If orphan_count=0, the GATE proposal (P1) is moot —…
filter rejected PROMPT reversible: simple 3h
When two proposals both target the same file and modify overlapping function blocks, the judge flags composability=major on at least one. Across 30 meta_filter_score runs (current observed volume), this surfaces implementation ordering conflicts before they reach Commander approv…
filter rejected PROMPT reversible: simple 2h
Genesis output rate for GW01-shaped proposals drops to <10% (estimated baseline ~23%, derived from 4 Commander kills against ~17 non-orphan proposals in the 7-day kill window). This is the earliest intervention point — preventing generation rather than catching proposals post-hoc…
filter rejected AXIS reversible: simple 4h
The 4 Commander-killed hypotheses (a38d31 = AI Control Failure Forecast Audit, e9cb5c = Reality-Graded Upgrade Gate, c89a71 = ClaimGate for B2B SaaS, 6bf9c5 = AI Tool Claim Verification) all score 0 on v2_a11. Graduated candidates score 2-3. The axis adds a 2-3 point composite sp…
rejected GATE reversible: simple 4h
The 7 v2_backfill_orphan_S148 kills (58% of all kills in 7-day window) are pre-empted before any moves are dispatched. Each orphaned row currently burns evidence_search, red_team_kill, and steelman moves before the kill fires — at the observed move:kill ratio (~360 moves / 12 kil…
filter rejected PROMPT reversible: simple 3h
With genesisRunCount ≈33 (33 days × ~1 run/day since 2026-04-19), S112 block is absent from PROPOSER_SYSTEM on next run. PROPOSER_SYSTEM character count drops by approximately the S112 block length (~900 chars, lines 88-106). Over the next 10 genesis runs without S112: if post-ho…
filter rejected CORPUS reversible: simple 4h
The next meta genesis run after deploy will include the 3 recent Commander KILLs (ClaimGate, AI Tool Claim Verification, AI Control Failure) and 1 DEFER (Upgrade Gate) as explicit Corpus E items. At least 1 proposal in that run should cite 'commander_override' in its evidence fie…
council rejected PROMPT reversible: simple 2h
Genesis (Anthropic/Sonnet 4.6) and the optimist reviewer (now OpenAI/Codex) are different vendors. On re-scoring 10 recent hypotheses with both versions, mean high-low spread per axis should widen by ≥0.15 across at least 3 of 5 axes, indicating the reviewers are less correlated.…
filter rejected TOOL reversible: simple 5h
Running node hypothesis_engine/scripts/backtest_filter_axes.js --axis=v2_a11 --labels=s157_labels.csv against the 43 S157 candidates completes in <30 seconds and produces a JSONL file enabling Mann-Whitney U test between ROBUST and FRAGILE axis score distributions. This replaces …
council rejected PROMPT reversible: simple 2h
Within 20 genesis runs post-deployment, ≥50% of generated hypotheses will contain the phrase 'exception class' or 'route to' or 'does not handle automatically' in their description field. Baseline rate is 2.3% (1/43 in S157 NBJ sweep). A grep over genesis output logs is sufficien…
filter rejected GATE reversible: simple 6h
Flag rate in first 30 days should fall between 5-15% of argument runs. Flagged hypotheses should show lower graduation rates than unflagged over a 60-day observation window. If the commander-killed AI control / ClaimGate / AI Tool Verification for Agencies hypotheses are run thro…
filter rejected GATE reversible: simple 3h
After 30 days, meta_engine/data/non_convergent/ will contain a corpus of ESCALATED transcripts. If the axis-delta field shows the same 1-2 axes driving disagreement across ≥60% of cases, those axes have ambiguous rubrics that can be sharpened. If zero files appear despite ESCALAT…
filter rejected AXIS reversible: simple 4h
The 3 commander-KILLed proposals in the last 7 days (AI control, ClaimGate, AI Tool Verification for Agencies) would score 0 on this axis due to entrenched audit/verification incumbents with regulatory switching costs. Retrospective on 43 S157 NBJ candidates should show statistic…
filter rejected GATE reversible: simple 6h
After 60 days of shadow logging, score ≥3 correlates with ≥70% eventual KILL verdicts. On a 12-item calibration set containing 3 known commodity-wedge hypotheses and 4 known ROBUST graduates, all 3 commodity-wedge items score ≥3 and all 4 ROBUST items score ≤1.
council rejected PROMPT reversible: simple 4h
Re-running escalated hypotheses 7199a9 and 2ca131 through the updated prompt each produces ≥2 distinct, non-overlapping testable questions naming specific observables (e.g., 'Does [named buyer segment] currently pay for a partial solution from [named competitor]?' rather than 'Is…
filter rejected AXIS reversible: simple 5h
d3786b-shaped hypotheses (cold outbound dependency, no owned channel) score v2_a7=0. ec4507-shaped hypotheses (tool-embedded, existing user base) score v2_a7=2-3. A 4-candidate test set spanning the distribution reachability spectrum produces a score range of ≥3 points on this ax…
filter rejected AXIS reversible: simple 5h
Of the 7 hypotheses killed in the last 7 days for 'wrong distribution shape or pain framing,' at least 5 score v2_a6 ≤1. ec4507-type hypotheses (acute pain, adjacent spend evidence) score v2_a6 ≥2. The pre-council kill rate for candy-shaped hypotheses increases measurably, reduci…
filter rejected TOOL reversible: simple 3h
Every future council escalation produces a persisted JSONL record. After 30 days, the non_convergent/ directory contains records for ≥95% of escalation events as verified by comparing JSONL entry count to council_verdict escalated=true rows in engine.db.
filter rejected HARNESS reversible: medium 8h
5 of 9 recent council verdicts (55%) contain explicit temporal deferral conditions ('run Week 1 outbound before building', '7-day artifact-upload test', '7-day signal check'). These hypotheses currently sit in GATHER_MORE_SIGNAL indefinitely with no automatic re-evaluation. After…
filter rejected GATE reversible: simple 6h
The 'RevOps Objection Taxonomy Normalizer' shape (GPT-5.5-Pro Round 1: passes describability, observed-buyer, solo-inbound, yet still structurally weak on urgency and data advantage) flags commodity_wedge=true on axes 3+4+5. After 4 weeks of shadow: hypotheses where commodity_wed…
council rejected GATE reversible: simple 8h
Codex retrospective on 43 S157-scored candidates: gate kills hyp-2026-05-06-847f7e (S157 score 0/5, structurally fragile on all 5 Q dimensions) and does not kill any of the three 5/5 ROBUST candidates (ec4507, 24a849, 3656a0). Spearman rank correlation between gate composite_scor…
council rejected TOOL reversible: simple 4h
Current non-convergence rate: 22% (2 of 9 recent verdicts = 'council could not converge after 3 rounds'). At 9 verdicts/week, 30 days produces ~8–10 transcripts. The escalationReason field already distinguishes FACTUAL vs WEIGHTING vs FRAMING disagreements. The corpus enables the…
council rejected ARCHITECTURE reversible: medium 12h
The 4 Commander-killed hypotheses (a38d31, c89a71, 6bf9c5, e9cb5c) should score ≤18/30 on v2 composite due to failing scalable_revenue (audit-shaped), distribution_reachability (no warm-contact base noted in override reason), and commander_non_engine_work_fit. The v1 composite sc…
filter rejected PROMPT reversible: simple 4h
Every GATHER_MORE_SIGNAL verdict (currently ~55% of verdicts per S157 distribution: 25 ROBUST + 13 MIXED + 4 FRAGILE + 1 STRUCTURALLY FRAGILE out of 43) produces a machine-readable gate condition that Commander can execute without re-reading the full reasoning. The 2 non-converge…
council rejected PROMPT reversible: simple 3h
Genesis outputs for commodity or evergreen problems (e.g. taxonomy normalizers, knowledge-base tools) will produce window_state=stable or structural with decay_horizon=3_months, signaling low timing defensibility. Proposals tied to genuine substrate shifts (agent-era trust gaps, …
council rejected PROMPT reversible: simple 2h
Argument transcripts will contain named companies in attacker rounds, enabling council to distinguish theoretical objections from documented failure patterns. The RevOps taxonomy shape (S158 Round 1 survivor shape that passes all describability/reachability checks) should generat…
council rejected PROMPT reversible: simple 3h
The 5 recent council verdicts (5d7cca, 26fc18, cc72cd, 90778c, c27754) each independently invented ad-hoc week-1 tests. Post-change, genesis outputs carry those tests, so council can evaluate their credibility rather than invent them. 10 consecutive genesis outputs should contain…
filter rejected PROMPT reversible: medium 8h
Hypotheses like Commander-KILLED a38d31 (audit product, no warm-contact base) and c89a71 (ClaimGate, relational sales) should score A5=0 (scalable_revenue: pure audit service) and A7=0–1 (distribution_reachability: warm intros required), producing v2 composites below 40% even if …
council rejected GATE reversible: simple 6h
RevOps Objection Taxonomy Normalizer shape (CRM-integrated taxonomy, no urgency event, dashboard deliverable) flags commodity_wedge_recommendation=true. hyp-2026-05-06-ec4507 (Support Escalation: SLA deadline forcing function, Zendesk timestamp as external ground truth, not CRM-d…
council rejected PROMPT reversible: simple 2h
hyp-2026-05-06-847f7e (Support Promise Calibration Console — killed because 'CSAT/SLA outcomes are multi-causal') scores 0-1 on fast_feedback_loops under the revised rubric. hyp-2026-05-13-47730e (AI Portfolio Claim Auditor — killed because 'board verdicts multi-causal') scores 0…
council rejected PROMPT reversible: simple 2h
After 20 genesis runs post-patch, at least 30% of hypothesis descriptions contain explicit exception-class language ('The workflow excludes...', 'Human review is required when...', or equivalent). Current S157 baseline: 1/43 graduated candidates (2.3%) explicitly named exception …
filter rejected AXIS reversible: simple 5h
RevOps Objection Taxonomy Normalizer shape (taxonomy/analytics, CRM-integrated, no named urgency event) scores 0-1. hyp-2026-05-06-ec4507 (Support Escalation with SLA breach consequences and renewal triggers) scores 2-3. Retrospective application to 43 S157-scored candidates show…
filter rejected TOOL reversible: simple 3h
After 30 days of active council cycles (current rate ~9/week, empirical non-convergence rate 22%), directory accumulates 8-12 non-convergent transcripts. This corpus enables first structured analysis of split-reason taxonomy: whether splits cluster by hypothesis type, ICP, or mod…
deferred GATE reversible: simple 5h
Of 9 recent deep_council_verdict runs, at least 3 of the 5 killed hypotheses would be caught here: d3786b (Agronomy Advisory — no observable paying buyer for AI-powered agronomy ledgers), c27754 (Medical-Device SME buying AI components — buyer leverage unverified), cc72cd (Bot-Pr…
deferred PROMPT reversible: simple 3h
After 2 genesis cycles (~62 new hypotheses at current 31/week throughput), at least 55% of generated proposals include a non-empty exception_classes field with ≥2 distinct named situations (not paraphrases). The companion exception_class_named axis (Proposal 1) will show mean sco…
deferred AXIS reversible: simple 5h
Applied retroactively to 43 S157-graduated candidates: hyp-2026-04-19-3656a0 (ec4507, cited explicit scope exclusions) scores 2-3; hyp-2026-05-06-847f7e (zero exception classes anywhere) scores 0; at least 38 of 43 score 0-1, producing a minimum 2-point composite spread between R…