4/6 Daniel Kim, Head of DevRel at Cerebras, gave a behind the scenes talk on how to support over 2000 tok/s inference with llama models.
Cerebras Achieves 2000 Tokens Per Second Llama Inference
By
–

By
–

4/6 Daniel Kim, Head of DevRel at Cerebras, gave a behind the scenes talk on how to support over 2000 tok/s inference with llama models.