AI Dynamics

Global AI News Aggregator

About

LLM360 releases TxT360: 15T token pre-training dataset

TxT360: new pre-training dataset with 15T tokens Impressive release from LLM360 with a new pre-training dataset of 15T tokens. It includes a lot of new sources compared to previous open-sourced pre-training datasets, like FreeLaw, PG-19 (books), etc. It's really interesting

→ View original post on X — @maximelabonne,