AI Dynamics

Global AI News Aggregator

About

DPO vs PPO: Lab Training Method Preferences Revealed

I'd say it's traditional at this point. I would expect them to use something more RL like PPO but it's interesting they chose DPO instead. It also shows that, despite all the *PO papers, most labs still use DPO variants.

→ View original post on X — @maximelabonne,