AI Dynamics

Global AI News Aggregator

llama.cpp achieves 300 tokens/second on Mac Studio M2 Ultra

Let me demonstrate the true power of llama.cpp: – Running on Mac Studio M2 Ultra (3 years old) – Gemma 4 26B A4B Q8_0 (full quality) – Built-in WebUI (ships with llama.cpp) – MCP support out of the box (web-search, HF, github, etc.) – Prompt speculative decoding The result: 300t/s (realtime video)

→ View original post on X — @julien_c, 2026-04-02 17:11 UTC

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *