Isn't it also related to diversity in your dataset? 2B = not diverse enough to scale, 10B = maybe okay?
Dataset Diversity Requirements for Language Model Scaling
By
–
By
–
Isn't it also related to diversity in your dataset? 2B = not diverse enough to scale, 10B = maybe okay?