AI Dynamics

Global AI News Aggregator

About

Gradient Division in Multi-Step Training Reproducibility

To make this reproducible on rigs where there are multiple inner steps per training step, isn't the division of the gradient by that number of steps missing?

→ View original post on X — @alexjc