Asynchronous RL for LLM training. Meituan Longcat fixes the rollout bottleneck from long reasoning traces by keeping multiple policy versions alive at once. Long trajectories can now stay on their original policy, so training can keep moving without dropping samples or breaking
Meituan Longcat Introduces Asynchronous RL for LLM Training
By
–
