Calibration

The engines track record, scored against itself. These are the numbers a customer should look at before trusting any verdict.

Headline metrics

122
total hypotheses
33%
graduation rate (decided)
9%
commander override rate
$2.21
avg cost per hypothesis
Override rate is the percentage of graduated-or-overridden cases where the human disagreed with the engine. A high rate means the engine is missing something the human catches; a low rate means the engine is well-calibrated. Currently 9% — within the target band of 10-25%.

Filter score distribution (graduated)

Among the 43 graduated hypotheses, where they fell on the composite filter score (out of 15).

Score bandCount
9.0-9.913
10.0-10.925
11.0-11.94
12.0+1

Commander overrides

ActionCount
DEFER1
KILL3

Why hypotheses get killed

ReasonCount
evidence_search_exhausted24
move_cap_reached13
v2_backfill_orphan_S1487
fatal_objection_both_confirm2
council_verdict_unanimous_kill2
structural_duplicate_15ed71_S1481

Cost transparency

Total engine spend across all moves: $270.12 across 3,474 logged operations. Average cost per hypothesis from admission to current state: $2.21.

Known limitations