AI Dynamics

Global AI News Aggregator

About

Hugging Face Research Pretraining Data Accidentally Leaked Publicly

Oh shit, it seems like all the HF Research team pretraining data has been accidentally leaked to the public. The web, PDFs, and synthetic datasets are expode on hf FineData org… Apparently, an intern used CC to push the data with private=False.

→ View original post on X — @thom_wolf, 2026-03-31 18:47 UTC