We indeed had a handful of loss spikes, but these were very rare (maybe less than 5 or 10 over the entire training) and never lasted more than a couple of iterations, so we didn't have to do anything like skipping batches or lowering learning rate.
Loss spikes during training were rare and brief
By
–
Leave a Reply