Check out DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining, and some of the papers it cites. By @sangmichaelxie et al.
DoReMi: Optimizing Data Mixtures for Faster Language Model Pretraining
By
–

By
–

Check out DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining, and some of the papers it cites. By @sangmichaelxie et al.