AI Dynamics

Global AI News Aggregator

Intermediate model SFT data reuse for distillation

The way I read the paper, it was the intermediate model that generated the SFT data, and then they just reused that for distillation. But pls correct me if there some counterpoint to that in the paper.

→ View original post on X — @rasbt,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *