AI Dynamics

Global AI News Aggregator

About

AI Influencers Misunderstand METR Benchmark Results for Sonnet

The number of AI influencers who are surprised that Sonnet 4.5 didn't achieve a better position on the METR benchmark, when they were saying it "could work autonomously for 30 hours," worries me. They're not understanding anything about what these benchmarks measure. They're

→ View original post on X — @dotcsv