AI Dynamics

Global AI News Aggregator

Reward Hacking in Reinforcement Learning: Exploiting Flawed Functions

At the end of Thanksgiving holidays, I finally finished the piece on reward hacking. Not an easy one to write, phew. Reward hacking occurs when an RL agent exploits flaws in the reward function or env to maximize rewards without learning the intended behavior. This is imo a

→ View original post on X — @lilianweng,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *