ah ok these are gpt3 hparams. it sounds like this alone would be enough to beat GPT2, and its unknown if 290B more tokens would get it to match or beat GPT-3 Small looking forward to the 1.5b – fascinating to see these all documented and taught live!!
GPT-3 Hyperparameters Analysis and Model Scaling Expectations
By
–
Leave a Reply