Yeah lots of LLMs are trained on synthetic data these days, especially at the fine-tuning stage – but that's deliberate, the open question for me is still the impact of accidental synthetic data in the larger pre-training data pool
By
–
Yeah lots of LLMs are trained on synthetic data these days, especially at the fine-tuning stage – but that's deliberate, the open question for me is still the impact of accidental synthetic data in the larger pre-training data pool