AI Dynamics

Global AI News Aggregator

About

Weight Sparsity Impact on Activation Scales Training

(2/n) Increasing weight sparsity causes vanishing activation scales with both SP and μP, leading to poor training dynamics.

→ View original post on X — @cerebras