AI Dynamics

Global AI News Aggregator

About

SFT, RLHF, and Process Supervision: Fine-tuning Techniques Compared

fine-tuning in this context:
– SFT/BC (supervised fine tuning & behavioral cloning) as a baseline, but is not powerful enough & has limitations
– RLHF/RLAIF as the big cannons
– Process Supervision, a la the recent @openai @hunterlightman paper as the newest hot technique

→ View original post on X — @alexandr_wang