Oh yeah you probably will. I am currently running tons of experiments to decide which of the Olmo 3 and DeepSeek V3.2 GRPO tweak I should add in the final version.
No side objective really except I'd do an honest evaluation to make sure that the model actually learns well.
Experimenting with Olmo 3 and DeepSeek V3.2 GRPO tweaks
By
–