3/ This core idea is very important to pay attention to: Synthetic data can create a short-term boost in eval results, but you will pay for it later with model collapse! You accumulate debt with mangling the model that starts invisible, and is very hard to repay.
Synthetic Data Training Risks Model Collapse Long Term
By
–