AI Dynamics

Global AI News Aggregator

About

Benchmarking Codex Goal Feature Against Senior Engineer Tasks

i just kicked off my Senior Engineer bench on Codex's /goal feature. we'll see how well it compares to a senior engineer rewriting a slop codebase. current high score on this benchmark is 66/100 achieved by GPT-5.5 with an Opus 4.6 plan—but with an agent baby sitter to make

→ View original post on X — @danshipper,