AI Dynamics

Global AI News Aggregator

About

CGPO: Mixture of Judges Outperforms RLHF Approaches

New paper from GenAI and Meta FAIR. CGPO uses Mixture of Judges and consistently outperforms SOTA RLHF approaches across various tasks. More details and key results in the full thread

→ View original post on X — @aiatmeta,