oh cool, is it *pretraining* data? and they do the same gradient-matching thing? i wonder what the advantage is over just mixing in and doing supervised training on a few examples every now and again
Pretraining Data and Gradient-Matching Training Approaches
By
–
Leave a Reply