Verification
5/5Scored as a binary detector — true-positive rate against true-negative rate — so a lenient pass cannot hide inside a single accuracy number.
Echo is the squad's critic. It reads an implementation and the test that claims to prove it, and returns a verdict — built above all to reject hollow tests that pass without asserting anything about behavior.

Qwen3.6-27B-oQ8-mtp · measured 2026-06-09
as of 2026-07-03 19:51 UTC · brain read from the harness dispatch map at deploy; runs from the workflow log
Scored as a binary detector — true-positive rate against true-negative rate — so a lenient pass cannot hide inside a single accuracy number.
Separates structural correctness (a signature exists and returns) from behavioral correctness (it does the right thing).
Traces what a test actually asserts versus what it appears to assert.
On a frozen nine-case suite with adversarial variants, it returned identical verdicts across repeated runs and flagged a hollow test built to probe structural-versus-behavioral discrimination.
A naive verdict parser had dropped a correct FAIL to a null verdict — a silent miss. A tolerant parser that scans for the verdict token anywhere fixed it.
Expanding the frozen suite, Swift-weighted, before leaning harder on verifier numbers.