AI Dynamics

Global AI News Aggregator

About

TorchTitan Model Configuration and Zero3 Zero2 Trade-offs

thanks for sharing the configs.
The model is too small for the defaults to be tuned around it. ( thanks @lessw2020 for looking into it).
Most folks who use TorchTitan don't quite train in that distribution.
Also, the zero3 -> zero2 is a compute/memory trade-off).

→ View original post on X — @soumithchintala,