10B tokens of FineWeb! Ilya said WebText was 40B tokens (
https://
youtube.com/watch?v=13CZPW
mke6A&t=3645s
… – for gpt2 1.5b) what accounts for the improved loss/accuracy that you got over GPT2 – have we improved our dataset filtering? were there smarter hparam choices made here? any ballpark attributions
FineWeb Dataset: Improvements Over GPT-2 Training Data
By
–
Leave a Reply