Let me demonstrate the true power of llama.cpp:
— Georgi Gerganov (@ggerganov) 2 avril 2026
– Running on Mac Studio M2 Ultra (3 years old)
– Gemma 4 26B A4B Q8_0 (full quality)
– Built-in WebUI (ships with llama.cpp)
– MCP support out of the box (web-search, HF, github, etc.)
– Prompt speculative decoding
The result:… pic.twitter.com/B3EnpbWJde
Let me demonstrate the true power of llama.cpp: – Running on Mac Studio M2 Ultra (3 years old) – Gemma 4 26B A4B Q8_0 (full quality) – Built-in WebUI (ships with llama.cpp) – MCP support out of the box (web-search, HF, github, etc.) – Prompt speculative decoding The result: 300t/s (realtime video)
Leave a Reply