AI Dynamics

Global AI News Aggregator

About

RLHF vs RLVR: From Likable to Useful

tldr: RLHF taught models to be likable. RLVR is teaching them to be useful.

→ View original post on X — @whats_ai