These kind of claims never pass the sniff test. Benchmarks can be cheated, but if it worked 0-11% of the time on real tasks (which are not part of benchmarks) nobody would ever use LLMs for coding. https://t.co/zp1qpQjf3P
— Peter Gostev (@petergostev) 19 mars 2026
These kind of claims never pass the sniff test. Benchmarks can be cheated, but if it worked 0-11% of the time on real tasks (which are not part of benchmarks) nobody would ever use LLMs for coding.