AI Dynamics

Global AI News Aggregator

Two-stage SFT outperforms single stage setup in preference alignment

Also super interesting in terms of SFT: a two-stage setup outperforms a single stage with the same data. Curious to see if that's still the case post-pref alignment. In general, it feels like there's a missed opportunity with no experiment around DPO.

→ View original post on X — @maximelabonne,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *