πππ-π.π nearly surpasses Claude 3.7 in coding?! New evaluation published with our #1 agent on the SWE-coding benchmark! GPT-4.1 outperforms Gemini 2.5 Pro and comes close to the level of Claude 3.7 Sonnet! Even GPT-4.1 mini matches the performance of Claude 3.5 Sonnet
GPT-4.1 Nearly Matches Claude 3.7 on SWE Coding Benchmark
By
–
