no. people misunderstand chinchilla.
chinchilla doesn't tell you the point of convergence.
it tells you the point of compute optimality.
if all you care about is perplexity, for every FLOPs compute budget, how big model on how many tokens should you train?
for reasons not fully
Chinchilla Scaling Laws: Compute Optimality vs Convergence Point
By
–
Leave a Reply