AI Dynamics

Global AI News Aggregator

llm.c Outperforming GPT-2/3 with Fewer Training Tokens

In llm.c pretraining we were already mildly perplexed why seem to be outperforming GPT-2 & 3 (124M) training on just 10B tokens instead of something closer to 100-300B, per the original papers. I suspect a good chunk of it may be just the dataset quality, so I'm eager to retrain

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *