ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models This paper introduces ProRL, a method that uses long-horizon reinforcement learning to unlock new reasoning strategies in LLMs—strategies that base models cannot access, even with
ProRL: Long-Horizon Reinforcement Learning Unlocks New LLM Reasoning
By
–
