Benchmark

The evaluation harness is open. Clone the repository, run it against your own model (a local model or any OpenAI-compatible endpoint), and submit the results; entries are added to the leaderboard. Methods and per-case results are published with each run.

github.com/SoftBacon-Software/mycelium

Leaderboard

Leaderboard — forthcoming.