AI Dynamics

Global AI News Aggregator

Models trained on 1T tokens with continued improvement at 7B scale

All our models were trained on at least 1T tokens, much more than what is typically used at this scale.
Interestingly, even after 1T tokens the 7B model was still improving.
3/n

→ View original post on X — @guillaumelample,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *