This paper basically gives RL its own “Kaplan moment.” Pretraining had scaling laws. Now RL does too. If this generalizes, we’ll finally move from “try random tricks” → “predict compute curves.” The science of post-training has officially begun. Read the full paper here:
New research establishes scaling laws for reinforcement learning
By
–