[Slides] This is one of the earliest papers on RLHF (if not the first, alongside InstructGPT). Before RLHF, language models didn’t really have personalities—they mostly relied on supervised fine-tuning or clever prompting to understand humans. Think back to the InstructGPT days.
Early RLHF Paper: From Supervised Fine-tuning to Personality in Language Models
By
–
Leave a Reply