AI Dynamics

Global AI News Aggregator

About

Instant agreement is a learned behavior

Instant agreement is a trained behavior, not a judgment. RLHF rewarded models for being agreeable, and this is the side effect.

→ View original post on X — @aihighlight,