Nvidia's Chief Scientist Bill Dally says there's a path to serving relatively large models at 10,000 to 20,000 tokens per user per second.
— Marcelo P. Lima (@MarceloLima) 4 avril 2026
For context, Opus 4.6 is ~43 and Grok 4.2 Beta is ~251 tokens/user/s 🤯 pic.twitter.com/mbZNFfWgUb
Nvidia's Chief Scientist Bill Dally says there's a path to serving relatively large models at 10,000 to 20,000 tokens per user per second. For context, Opus 4.6 is ~43 and Grok 4.2 Beta is ~251 tokens/user/s 🤯 [Translated from EN to English]