S02026-06-02

Restoring vision in a quantized planner

quantization
vision
planner
mtp

The planner agent's initial quantization shipped without the multi-token-prediction weights its vision path required, leaving it text-only. This was not a capability the agent lacked — it was a quantization detail that removed access to it.

Observation

After deploying the initial quantized checkpoint (8-bit), the agent processed all inputs as text regardless of whether images were attached. Inspection of the quantized weights confirmed that the MTP-associated vision pathway was absent from the checkpoint. The standard quantization pipeline had not included those layers.

Re-quantization

A second pass was run with explicit inclusion of the MTP weights, producing the oQ8-mtp checkpoint. It was deployed in place of the original and registered with the local server, with the KV cache quantized for headroom on a single machine:

// model_settings.json (per-model)
"Qwen3.6-27B-oQ8-mtp": {
  "turboquant_kv_enabled": true,
  "turboquant_kv_bits": 4,
  "turboquant_skip_last": true
}

Following deployment, MTP fired on 73–88% of tokens in subsequent runs. The vision path became functional — the agent could receive and reason over image inputs as intended — while the 4-bit KV cache kept it resident alongside the rest of the squad.

Implication

Capability gaps in quantized models are not always attributable to the base model. Here the base model had the capability; the quantization removed the path to it. Auditing which weight groups a quantized checkpoint includes is a necessary step before declaring a capability absent.