Multi-Token Prediction support has been patched into LLaMA.cpp, speeding up local Gemma 4 runs!
— 🚨 AI News | TestingCatalog (@testingcatalog) 8 mai 2026
> The team quantized Gemma 4 assistant models into GGUF and tested them on a MacBook Pro M5 Max.
Gemma 4 26B with MTP draft tokens reportedly runs around 40% faster, suggesting a… https://t.co/pskUHcjSpl pic.twitter.com/sYtbuwWm6t
Multi-Token Prediction support has been patched into LLaMA.cpp, speeding up local Gemma 4 runs! > The team quantized Gemma 4 assistant models into GGUF and tested them on a MacBook Pro M5 Max. Gemma 4 26B with MTP draft tokens reportedly runs around 40% faster, suggesting a