AI Dynamics

Global AI News Aggregator

About

Synthetic Data Impact on LLM Pre-training Datasets

Yeah lots of LLMs are trained on synthetic data these days, especially at the fine-tuning stage – but that's deliberate, the open question for me is still the impact of accidental synthetic data in the larger pre-training data pool

→ View original post on X — @simonw,