AI Dynamics

Global AI News Aggregator

About

SimPO: Reference-Free Preference Optimization for Language Models

10/ SimPO – a simpler and more effective approach for preference optimization with a reference-free reward; uses the average log probability of a sequence as an implicit reward (i.e., no reference model required) which makes it more compute and memory efficient.

→ View original post on X — @dair_ai