Good one! Intermediate Layer Distillation is another technique that I’ve read about. Instead of only matching the teacher’s final output distribution, you also align intermediate hidden states, attention patterns, or feature maps across corresponding layers. The intuition
Intermediate Layer Distillation: Aligning AI Models with Hidden States
By
–