← all meta proposals

Require empirical failure analog in argument.js red_team_kill attacker output

council rejected PROMPT reversible: simple 2h proposed 19 May 2026
What is the proposed change?
In the argument.js red_team_kill attacker SYSTEM prompt, add after the core adversarial mandate: 'EMPIRICAL ANCHOR RULE: Each attack in `kill_points` MUST be paired with at least one named real-world company, product, or startup that encountered the specific failure mode you are describing. Use the format: (CompanyName, ~Year: what they tried, how it failed or stalled). You may use your bounded web search allocation to find this evidence. If no direct analog exists, explicitly state "No documented analog" and explain whether its absence strengthens or weakens the attack. Generic phrases such as "many companies have tried this" or "incumbents will copy it" without a named entity do not satisfy this requirement.' Update the `kill_points` field description from `["point 1", "point 2", "point 3"]` to `["[CompanyName ~Year: analog] point text", ...]` to encode the expectation in the schema itself.
Target files
hypothesis_engine/moves/argument.js
Expected effect
Argument transcripts will contain named companies in attacker rounds, enabling council to distinguish theoretical objections from documented failure patterns. The RevOps taxonomy shape (S158 Round 1 survivor shape that passes all describability/reachability checks) should generate kill_points citing Gong, Clari, or Chorus analogs for the 'saturated integration wedge' failure mode. Council reasoning will reference analogs in at least 40% of verdicts post-change. The existing bounded web search (2–3 queries) is already available to the attacker; this change directs those queries toward documented failures rather than generic market demand.
Falsifier — what would prove this wrong?
Review 10 consecutive red_team_kill move outputs post-change. In ≥7/10, the attacker names at least one specific company or product per kill_point (not just a category). In ≥4/10, the council_verdict references the named analog in its `reasoning` field. If attacker consistently produces 'No documented analog' for hypotheses in well-documented markets (CRM, RevOps, B2B SaaS support tooling), the prompt requires few-shot examples of analog-finding to be injected alongside the instruction.
Evidence that triggered the proposal
  • Corpus D, META_ENGINE_S158_ROUND2_SYNTHESIS.md: 'Both rounds + both reviewers + the same diagnosis. Every paper-rubric we add is gameable by the proposer. Without empirical falsifiers running, we cannot tell whether rubrics actually discriminate quality from text-shape.'
  • Corpus D, META_ENGINE_S158_RED_TEAM_BRIEF.md: RevOps Objection Taxonomy Normalizer 'passes describability, passes observed-buyer, passes solo-inbound — still bad because saturated integration wedge, no defensible data advantage, weak urgency' — theoretical argument.js attacks cannot surface this without named Gong/Clari analogs
  • Corpus D, red_team_reviews/hypothesis-engine-v1-round1-gpt-5.4-pro.md: 'Dossier Context Poisoning — LLMs suffer from agreement bias; persuasive steelman first → subsequent red team anchors to narrative.' Named analogs anchor the attacker to external reality, counteracting the agreement-bias failure mode

Proposer self-score

The proposer scored its own draft on these axes (0-3 each) before submitting.

AxisScore
specificity3
falsifier2
solo feasible3
blast radius3
composability3
reversibility3
Disposition
Rejected at the council verdict. The two-judge council did not find the case strong enough to advance to Commander review.

Evaluation history

WhenMove
2026-05-23 04:33meta_council_verdict
2026-05-23 04:19meta_argument
2026-05-19 12:48red_team_kill
2026-05-19 11:06steelman
2026-05-19 10:06meta_filter_score
2026-05-19 09:53meta_genesis