AI Dynamics

Global AI News Aggregator

About

Memory Optimization for Multi-GPU Distributed AI Training

no it’s very small since we’re looping over each weight matrix individually the only way memory usage would be non-negligible would be if u used a large number of devices — say 256 GPUs in that case don’t use my code 🙂

→ View original post on X — @jxmnop