AI Dynamics

Global AI News Aggregator

Training Loss Noise and Validation Loss Smoothing Explained

Training loss is evaluated over the batch, i.e. 0.5M tokens. It's noisy but this is expected, you could be iterating through easy or hard documents in the training data. The validation loss is averaged over 20 batches of 0.5M tokens (this is a hyperparameter), so it is smoother.

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *