Pretty shocking result (that once again confirms what I wrote about the perils of distribution shift, 25 years ago):
— Gary Marcus (@GaryMarcus) 19 mars 2026
Translate coding benchmarks into languages LLMs can’t memorize and performance utterly falls apart. https://t.co/wu5fh57nLZ
Pretty shocking result (that once again confirms what I wrote about the perils of distribution shift, 25 years ago): Translate coding benchmarks into languages LLMs can’t memorize and performance utterly falls apart.