FineWeb 2.0 – 8 Terabytes, 3 Trillion tokens, 1000 languages – simply the best multilingual pre-training corpus out there! Available under a commercially permissive license!
FineWeb 2.0: 3 Trillion Token Multilingual Training Corpus Released
By
–
Global AI News Aggregator
By
–
FineWeb 2.0 – 8 Terabytes, 3 Trillion tokens, 1000 languages – simply the best multilingual pre-training corpus out there! Available under a commercially permissive license!
Leave a Reply