AI Dynamics

Global AI News Aggregator

About

Cerebras Inference 3x Faster: Llama 70B Reaches 2,100 Tokens/Second

Cerebras Inference is now 3x faster: Llama3.1-70B just broke 2,100 tokens/s
– 16x faster than the fastest GPU solution
– 8x faster than GPUs running Llama *3B*
– It's like the perf of a new hardware generation in a single software release
Available now at

→ View original post on X — @cerebras