AI Dynamics

Global AI News Aggregator

About

Linear Scaling AI Model Training Across Multiple Systems

After the single box ran, we scaled it to 2 and 16 systems with linear scaling. There were no model & code changes required.
Megatron
DeepSpeed
Sharding The model lives on a single block of memory. No need to break it up. The whole thing trains like one giant GPU.

→ View original post on X — @cerebras