llama.cpp achieves 300 tokens/second on Mac Studio M2 Ultra

AI Dynamics

Global AI News Aggregator

llama.cpp achieves 300 tokens/second on Mac Studio M2 Ultra

–

02 April 2026 19h11

Let me demonstrate the true power of llama.cpp:

– Running on Mac Studio M2 Ultra (3 years old)
– Gemma 4 26B A4B Q8_0 (full quality)
– Built-in WebUI (ships with llama.cpp)
– MCP support out of the box (web-search, HF, github, etc.)
– Prompt speculative decoding

The result:… pic.twitter.com/B3EnpbWJde
— Georgi Gerganov (@ggerganov) 2 avril 2026

Let me demonstrate the true power of llama.cpp: – Running on Mac Studio M2 Ultra (3 years old) – Gemma 4 26B A4B Q8_0 (full quality) – Built-in WebUI (ships with llama.cpp) – MCP support out of the box (web-search, HF, github, etc.) – Prompt speculative decoding The result: 300t/s (realtime video)

→ View original post on X — @julien_c, 2026-04-02 17:11 UTC

2 April 2026

AI CODE COMPUTING GENERATIVE AI HARDWARE INNOVATION MACHINE LEARNING OPEN SOURCE SOFTWARE TOOLS

AI Dynamics

llama.cpp achieves 300 tokens/second on Mac Studio M2 Ultra

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

Cheaper exploration at scale remains advantageous despite no new exploits

Gold Status Experience Brings Satisfaction

Using ChatGPT for Essay Feedback and Improvement

Intelligence Gone Wrong: Cheating Despite Having Correct Answer