For how long should you train your language model? How large should your model be? Research from @DbrxMosaicAI proposes a modified scaling law that quantifies the training-inference trade-off – producing models that are optimal over their total lifetime. https://
dbricks.co/3zSwPJX
Scaling Laws: Training Duration and Model Size Optimization
By
–
