AI Dynamics

Global AI News Aggregator

Olmo 2 Ablation Study on Loss Spikes and Normalization Techniques

In their Olmo 2 report they had an ablation study showing it reduces the loss spikes during training (but they also included QK norm, so it's hard to say how much of that reduction is due to QK norm and their post norm flavor).
Maybe best of both words is to do both like Gemma

→ View original post on X — @rasbt,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *