🚨 Cerebras Inference is now 3x faster:
— Cerebras (@cerebras) 24 octobre 2024
Llama3.1-70B just broke 2,100 tokens/s
– 16x faster than the fastest GPU solution
– 8x faster than GPUs running Llama *3B*
– It's like the perf of a new hardware generation in a single software release
Available now at… pic.twitter.com/9VgGWGO6qY
Cerebras Inference is now 3x faster: Llama3.1-70B just broke 2,100 tokens/s
– 16x faster than the fastest GPU solution
– 8x faster than GPUs running Llama *3B*
– It's like the perf of a new hardware generation in a single software release
Available now at