Can reinforcement learning really make language models better reasoners? This study says yes — with a twist. Introducing QuestA, a Question Augmentation strategy that feeds models partial solutions during RL training to ease difficulty and deliver richer feedback. Applied to
QuestA: Reinforcement Learning Improves Language Model Reasoning
By
–
Leave a Reply