o3 is very performant. More importantly, progress from o1 to o3 was only three months, which shows how fast progress will be in the new paradigm of RL on chain of thought to scale inference compute. Way faster than pretraining paradigm of new model every 1-2 years
O3 Performance Shows Rapid Progress in RL Chain-of-Thought Scaling
By
–
Leave a Reply