If anything, this graph tells me that RedPajama is worse than the Pile (Pythia) at the same number of pretraining tokens. People can read graphs right? But yea few-shot at 3b is probably really just noise tbh anyway.
RedPajama vs Pile: Dataset Quality Comparison Analysis
By
–
Leave a Reply