Cerebras Systems + Sandia National Labs have demonstrated training of a 1 trillion parameter model on a single CS-3 system (!) This is ~1% the footprint & power of an equivalent GPU cluster.
Cerebras CS-3 trains 1 trillion parameter model efficiently
By
–
