Enjoyed this paper that plots emergent abilities with pretraining loss on the x-axis, which is actually a suggestion that @OriolVinyalsML also made a few years back: https://
arxiv.org/abs/2403.15796 The paper uses intermediate checkpoints to plot a variety of pretraining losses. For some
Emergent Abilities Plotted Against Pretraining Loss in New Paper
By
–
Leave a Reply