AI Dynamics

Global AI News Aggregator

Why 10B Tokens Suffice for GPT Training Performance

Great question yes I was surprised that 10B seemed enough. I believe GPT-2 was trained on somewhere ~100B tokens. The reason we reach this performance in 10B tokens I think may be the following: 1. FineWeb could just be higher quality than WebText, on a per-token basis. This was

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *