Reinforcement learning works better when using stronger base models. In their recent post, SemiAnalysis stated that o1 and o3 were trained with GPT-4o as the base, and the respective 'mini' versions were distillations of their larger models.
Reinforcement Learning Improves with Stronger Base Models Like GPT-4o
By
–
Leave a Reply