6/ Reasoning with Reinforced Fine-Tuning – an approach, ReFT, to enhance the generalizability of LLMs for reasoning; it starts with applying SFT and then applies online RL for further refinement while automatically sampling reasoning paths to learn from.
ReFT: Enhancing LLM Reasoning Through Reinforced Fine-Tuning
By
–
