AI Dynamics

Global AI News Aggregator

About

DAPO: Open-Source RL Training for Qwen2.5-32B LLMs

Reinforcement learning for LLMs—fully open-sourced DAPO trains Qwen2.5-32B with RL, hitting 50 points on AIME 2024, outperforming DeepSeek-R1-Zero after just 50% of the training steps. Open-source code & dataset
Improved training stability Trending #1 on alphaXiv

→ View original post on X — @askalphaxiv,