AI Dynamics

Global AI News Aggregator

About

GPT-3’s Imitation of Pathological Preferences in Reward Models Discussed

To be fair, the point Gwern's making (not shown in the quote) isn't just "I don't like it" — it's that GPT-3 is capable of imitating pathological preferences in the reward model.

→ View original post on X — @goodside