New post: RLHF – Reinforcement Learning from Human Feedback Discussing 3 phases of ChatGPT development, where RLHF fits in, how RLHF works, hypotheses on why it works, and relationship between RLHF and hallucination. https://
huyenchip.com/2023/05/02/rlh
f.html
…
RLHF: Reinforcement Learning from Human Feedback Explained
By
–
