I'd say it's traditional at this point. I would expect them to use something more RL like PPO but it's interesting they chose DPO instead. It also shows that, despite all the *PO papers, most labs still use DPO variants.
By
–
I'd say it's traditional at this point. I would expect them to use something more RL like PPO but it's interesting they chose DPO instead. It also shows that, despite all the *PO papers, most labs still use DPO variants.