To be fair, the point Gwern's making (not shown in the quote) isn't just "I don't like it" — it's that GPT-3 is capable of imitating pathological preferences in the reward model.
GPT-3’s Imitation of Pathological Preferences in Reward Models Discussed
By
–