At the end of Thanksgiving holidays, I finally finished the piece on reward hacking. Not an easy one to write, phew. Reward hacking occurs when an RL agent exploits flaws in the reward function or env to maximize rewards without learning the intended behavior. This is imo a
Reward Hacking in Reinforcement Learning: Exploiting Flawed Functions
By
–
Leave a Reply