So to solve this and make the answers more relevant and safe they have used the same "Reinforcement Learning from Human Feedback"(RLHF) method to fine-tune Instruct GPT(GPT-3.5). Let's go through and understand how RLHF works which is a 3-step process. 4/9
Understanding RLHF: Three-Step Process for Fine-tuning Instruct GPT
By
–