7). Open-Reasoner-Zero – an open-source large-scale minimalist RL framework to enhance reasoning. Achieves significant scalability requiring only 1/30th of the training steps of DeepSeek-R1-Zero-Qwen-32B to outperform it on GPQA Diamond.
Open-Reasoner-Zero: Efficient RL Framework Outperforms DeepSeek-R1
By
–
