AI Dynamics

Global AI News Aggregator

About

DeepMind’s Reinforced Self-Training removes human from RLHF

DeepMind showcases iterative self-improvement for Natural Language Generation. They basically took the "human" out of Reinforcement Learning from Human Feedback (RLHF) cycle and are calling it Reinforced Self-Training (ReST) [link to paper in ALT text]

→ View original post on X — @aibreakfast