not enough people are talking about the delta between the parallel thinking uplifts of oai vs gdm AIME
o3 pro: +3% (from 90->93 on 2024)
deep think: +11.2% (from 88->99.2 on 2025) Knowledge
o3 pro: +3% (on GPQA)
deep think: +13.2% (on HLE) Coding
o3 pro: +9.1% (on Codeforces
DeepThink vs O3 Pro: Dramatic Performance Gap Analysis
By
–
Leave a Reply