Cool! For the spike I'd try e.g. `-sl 7 -sg 7` to keep instability in check earlier in the training. (will skip update if loss/gradnorm > 7 sigma outlier is detected)
Training stability control with loss and gradient norm thresholding
By
–
Global AI News Aggregator
By
–
Cool! For the spike I'd try e.g. `-sl 7 -sg 7` to keep instability in check earlier in the training. (will skip update if loss/gradnorm > 7 sigma outlier is detected)
Leave a Reply