AI Dynamics

Global AI News Aggregator

About

AI Datasets Shift: Fewer Examples, Longer Sequences

an interesting trend in AI is that the best datasets have fewer and fewer longer and longer sequences dataset five years ago:
~10^5 examples, each of 2^6 tokens nowadays:
~10^3 examples, each of 2^15 tokens it’s actually more data. but the tokens are stacked horizontally now

→ View original post on X — @jxmnop