Score-DPO is such a clean idea Instead of just “which one’s better,” it asks “how much better?” That’s a smarter gradient, and it shows in the results
By
–
Score-DPO is such a clean idea Instead of just “which one’s better,” it asks “how much better?” That’s a smarter gradient, and it shows in the results