Cerebras Wafer Scale Cluster uses unique, terabyte-scale external memory device called MemoryX to store model weights. For Sandia’s run, Cerebras configured a 55 terabyte MemoryX device – enough to comfortably store 1T parameters and optimizer states.
Cerebras MemoryX: 55TB External Memory for Trillion Parameters
By
–