AI Dynamics

Global AI News Aggregator

About

RLVR Models Sacrifice Diversity for Base Model Preferences

RLVR'd model would fail to answer questions that its base model previously could. "The Invisible Leash: Why RLVR May Not Escape Its Origin" This research shows that RLVR just boosts the base model’s favorite answers (raising pass@1) and would sacrifice diversity.

→ View original post on X — @askalphaxiv