The codex-1 model is state of the art on SWE-Bench Verified, we published the numbers on the blog. More importantly, we optimized it to generate code that people actually want to merge, not just code that scores well on benchmarks!
Codex-1 Achieves State-of-the-Art Performance on SWE-Bench Verified
By
–
Leave a Reply