A startup just proved that the benchmark the entire AI industry uses to rank coding models has been broken the whole time. And one model family was consistently exploiting the flaw.
By
–

A startup just proved that the benchmark the entire AI industry uses to rank coding models has been broken the whole time. And one model family was consistently exploiting the flaw.