Our results first confirm inverse scaling behavior seen on prior models trained up to 500 zettaFLOPs. But at 2K zettaFLOPs, it becomes U-shaped. U-scaling has also been shown in prior work, such as BIG-Bench.
U-shaped Scaling Behavior Emerges at Higher Computational Budgets
By
–
Leave a Reply