AI Dynamics

Global AI News Aggregator

Distilled Models Gain from Online RL Training Beyond Initial Distillation

We found that a model distilled on reasoning traces from a larger model still benefits a lot from additional online RL training. In particular, we trained Magistral Small by distilling it from Magistral Medium and running additional RL.

→ View original post on X — @guillaumelample,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *