AI Dynamics

Global AI News Aggregator

FineWeb Dataset: Improvements Over GPT-2 Training Data

10B tokens of FineWeb! Ilya said WebText was 40B tokens (
https://
youtube.com/watch?v=13CZPW
mke6A&t=3645s
… – for gpt2 1.5b) what accounts for the improved loss/accuracy that you got over GPT2 – have we improved our dataset filtering? were there smarter hparam choices made here? any ballpark attributions

→ View original post on X — @swyx,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *