Born Again Neural Networks is an even earlier example from 2018, showing you can improve performance on a CIFAR classifier by distilling into an identically sized student model:
Born Again Neural Networks: distillation improves CIFAR classifiers
By
–