AI Dynamics

Global AI News Aggregator

Token Importance in On-Policy Distillation with Selective Training

"TIP: Token Importance in On-Policy Distillation" This paper introduces selective token training for on-policy distillation, relying on student entropy to find high-signal tokens. A key point is that entropy misses confident mistakes, so they add teacher-student divergence to

→ View original post on X — @askalphaxiv,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *