fine-tuning in this context:
– SFT/BC (supervised fine tuning & behavioral cloning) as a baseline, but is not powerful enough & has limitations
– RLHF/RLAIF as the big cannons
– Process Supervision, a la the recent @openai @hunterlightman paper as the newest hot technique
SFT, RLHF, and Process Supervision: Fine-tuning Techniques Compared
By
–