Later, reinforcement learning was used to automate the task of human evaluation as well. This yielded a endless fountain of tuning data that could be produced entirely by machine — a process known as RLHF.
RLHF automates human evaluation and generates tuning data
By
–
