AI Dynamics

Global AI News Aggregator

About

Scaling Laws Enable Efficient Transformer Hyperparameter Optimization

Modern transformers are well-behaved according to well-behaved scaling laws (which are a function of (num. of tokens, num. of parameters).
This allows one to find the hyperparameters at a smaller scale, and then keep scaling up parameters and data according to some power law.

→ View original post on X — @soumithchintala,