I think this is a reasonable point in theory. In practice, we don't have those model sizes. But assuming that we did, I think there's still something interesting going on with emergence, e.g., "We can't predict a performance spike at say 10B with models from 5B, 4.9B, 4.8B…"
Unpredictable Performance Spikes in Scaling Neural Networks
By
–
Leave a Reply