AI Dynamics

Global AI News Aggregator

About

PyTorch DDP Gradient Syncing Bug in Distributed Training

so last week I was posting about a bug i had with gradient syncing in torch DistributedDataParallel (DDP), aka the most standard way to do multi-GPU training for neural networks all of my misery originated with a single design decision from the torch team: DDP shares

→ View original post on X — @jxmnop