There is a fascinating recent trend of training *smaller models for longer* w.r.t. Chinchilla optimal predictions Best explanation I've seen of this? This new blog post by @harm_devries (with collaborators of the @BigCodeProject
): https://
harmdevries.com/post/model-siz
e-vs-compute-overhead/
… Clearly these are only
Training Smaller Models Longer Challenges Chinchilla Predictions
By
–
Leave a Reply