I think it only works when you have a larger model produce data for a smaller one. Otherwise the large model -> same large model, without any other external signals, will likely just learn something very similar
Large Models Training Smaller Models: Data Generation Strategy
By
–
Leave a Reply