so last week I was posting about a bug i had with gradient syncing in torch DistributedDataParallel (DDP), aka the most standard way to do multi-GPU training for neural networks all of my misery originated with a single design decision from the torch team: DDP shares
PyTorch DDP Gradient Syncing Bug in Distributed Training
By
–
