Cerebras launched inference just 8 months ago. Today it is officially part of Llama API. Any developer can now click a button and get a wafer-scale chip to generate tokens at ~2,600 t/s. Insane progress.
Cerebras Inference Now Available on Llama API Platform
By
–
