AI Dynamics

Global AI News Aggregator

FineWeb 2.0: 3 Trillion Token Multilingual Training Corpus Released

FineWeb 2.0 – 8 Terabytes, 3 Trillion tokens, 1000 languages – simply the best multilingual pre-training corpus out there! Available under a commercially permissive license!

→ View original post on X — @reach_vb,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *