Thanks! Right now, it's a bit inconclusive. Lots of papers show that supervised instruction-finetuning is sufficient (vs RLHF instruction-finetuning). (I cannot put my finger on it, but I feel like we don't have good evaluations, yet, and this is not the end of the story.)
Instruction-Finetuning vs RLHF: Inconclusive Research Findings
By
–
Leave a Reply