AI Dynamics

Global AI News Aggregator

About

Co-Evolving Policy Distillation Tackles Expert-Student Drift in Post-Training

"Co-Evolving Policy Distillation" A lot of post-training today follows a simple recipe where you train separate experts, then distill them into one model. But the problem is, by the time distillation starts, the expert and the student have already drifted too far apart, so a

→ View original post on X — @askalphaxiv,