It was never a secret. The alignment tax (RLHF hurts perf on NLP benchmarks) is mentioned in the InstructGPT paper Jan 2022. More noticed after Mysteries of Mode Collapse Nov 2022. (My Mask joke was post-shoggoth; ppl hated RLHF well before that)
Alignment Tax: RLHF’s Impact on NLP Performance Discussed
By
–