AI Dynamics

Global AI News Aggregator

About

RLHF automates human evaluation and generates tuning data

Later, reinforcement learning was used to automate the task of human evaluation as well. This yielded a endless fountain of tuning data that could be produced entirely by machine — a process known as RLHF.

→ View original post on X — @goodside