Mycelium

References

This work stands on prior research, an open tabletop tradition, and the open-source local-AI stack it runs on. Everything we use is linked and credited here. Citations are verified; if one is wrong, that is a bug to fix — open an issue.

Benchmark methodology

RPGBench — arXiv:2502.00595
Evaluates a language model as a role-playing-game engine — the Dungeon-Master role. The direct basis for our objective Game-Simulation metrics (MEC / ECE / VUE). Our one addition is to compute those metrics against an independentrules engine rather than the model's self-report — stricter, and reproducible.
Minions — Narayan, Biderman, Eyuboglu, May, Linderman, Zou, Ré (Hazy Research, Stanford) — arXiv:2502.15964
The frontier↔local collaboration framework and the local-offload idea — how much work the local model carries instead of the frontier. This is the prior-art anchor for our cost-savings metric and the persona-meld thesis underneath the squad. Code: github.com/HazyResearch/Minions. Lead author Avanika Narayan and OpenJarvis's Jon Saad-Falcon publicly share the framing this work builds on — “AI inference should be local by default, hybrid by design.” We credit the people, not just the papers.
Jericho — github.com/microsoft/jericho
A learning environment for text adventure games (Microsoft Research). A reference for the player side of evaluation — whether a model can play — and for the Gym-style, contamination-aware harness discipline a benchmark needs to stay out of training sets.

The tabletop tradition

Dungeons & Dragons — 5e System Reference Document
The rules lineage our engine implements, used under the SRD's open license (Wizards of the Coast). The mechanics MOTU resolves deterministically — checks, combat, conditions — descend from the 5e SRD.
Genre traditions
The worlds our cartridges draw on lean on the wider tabletop canon — Shadowrun, Cyberpunk RED, Starfinder, Eberron — for proving one ruleset can carry many settings. The Canopy Heart cartridge is solarpunk by way of that lineage.

Open source we run on

Qwen3.6 / Qwen3-Coder — the Qwen team, Alibaba — github.com/QwenLM/Qwen3-Coder
The local model weights the squad runs on — the planner, coder, and critic brains behind every result published here.
MLX — Apple (ml-explore) — github.com/ml-explore/mlx
The Apple-Silicon array framework underneath the serving stack — what lets these models run on hardware we own.
oMLX — jundot — github.com/jundot/omlx
The OpenAI-compatible MLX server that runs the models locally — and makes the per-token throughput and energy measurements on the agent pages possible.
OpenJarvis — Jon Saad-Falcon (lead) and contributors, Stanford — github.com/open-jarvis/OpenJarvis
“Personal AI, on personal devices.” The on-device agent lineage our research line builds on — the Apple-FM engine path and model routing. We use their work, and we want them to win.

Collaboration

Anthropic / Claude — the frontier collaborator
The squad exists to do the bulk of the work locally; Claude is the orchestration, the specs, and the taste. We say so plainly: this was built with a frontier model, deliberately reserved for judgment so the rest could run on hardware we own.

Influenced this and not listed? That is an omission to fix, not a judgment — open an issue or tell us.