AI Dynamics

Global AI News Aggregator

FineWeb2 Releases Multilingual Dataset for AI Pretraining

The FineWeb team is happy to finally release "FineWeb2" FineWeb 2 extends the data driven approach to pre-training dataset design introduced in FineWeb 1 to now covers 1893 languages In our experiments, it tops all other publicly available multilingual pretraining datasets

→ View original post on X — @thom_wolf,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *