AI Dynamics

Global AI News Aggregator

About

LLM Coding Benchmarks Questioned Real World Performance

These kind of claims never pass the sniff test. Benchmarks can be cheated, but if it worked 0-11% of the time on real tasks (which are not part of benchmarks) nobody would ever use LLMs for coding.

→ View original post on X — @petergostev