AI Dynamics

Global AI News Aggregator

About

AI Code Generation Benchmarks Miss Human Review Bottleneck

These kinds of benchmarks are misleading without a joint metric showing much work was necessary by humans after the fact. How much time to clean up that 2h42m of code? Style and architecture need to make sense, not just passing tests. That's the bottleneck now: reviewing!

→ View original post on X — @alexjc,