their model is trained on 800K words, which is around 2.5M tokens. remember today's models are up to ~25T tokens (10^7 more, or 10 million million times) (they trained another model on 15M words)
Model Training Scale: From 800K to 25 Trillion Tokens
By
–
