an interesting trend in AI is that the best datasets have fewer and fewer longer and longer sequences dataset five years ago:
~10^5 examples, each of 2^6 tokens nowadays:
~10^3 examples, each of 2^15 tokens it’s actually more data. but the tokens are stacked horizontally now
AI Datasets Shift: Fewer Examples, Longer Sequences
By
–