AI Dynamics

Global AI News Aggregator

About

Nvidia Eyes Model Serving at 10,000-20,000 Tokens Per Second

Nvidia's Chief Scientist Bill Dally says there's a path to serving relatively large models at 10,000 to 20,000 tokens per user per second. For context, Opus 4.6 is ~43 and Grok 4.2 Beta is ~251 tokens/user/s 🤯 [Translated from EN to English]

→ View original post on X — @alexjc, 2026-04-04 17:43 UTC