AI Dynamics

Global AI News Aggregator

Rule-Based Rewards for AI Safety and Capability Alignment

Rule-based rewards (RBRs) use model to provide RL signals based on a set of safety rubrics, making it easier to adapt to changing safety policies wo/ heavy dependency on human data. It also enables us to look at safety and capability in a more unified lens as a more capable

→ View original post on X — @lilianweng,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *