AI Dynamics

Global AI News Aggregator

About

ScaleRL outperforms in RL scaling tests

They tested this with 400,000 GPU hours across multiple RL recipes: DeepSeek (GRPO)
Qwen (DAPO)
Magistral
Minimax Result: only ScaleRL showed a stable, predictable trajectory. The others broke scaling laws entirely their curves collapsed.

→ View original post on X — @godofprompt