AI Dynamics

Global AI News Aggregator

About

Understanding FineWeb: Beyond CommonCrawl Dataset Architecture

yeah once this paper was released i was like “ohhh so fineweb isnt simply commoncrawl plus plus” and it all clicked into place. @eugeneyan pointed me to this apple paper we’ve talked about on the pod

→ View original post on X — @swyx