What people call "distillation" is a super common practice (you use other models to benchmark your model, to evaluate your inputs or to add a little bit to your datasets) that in my opinion should be covered by fair use (just like using public data is), especially when the
Model Distillation Fair Use Debate in AI Training
By
–