Because our compute and memory are ~20x larger, even 100B models fit entirely in memory without tensor/pipeline parallel gymnastics. Our GPT implementation uses 1/20th the lines of code of Nvidia Megatron. Huge models work out of the box on Cerebras. This is why G42 picked us.
Cerebras: 20x Memory Advantage Enables 100B Models Natively
By
–
