AI Dynamics

Global AI News Aggregator

About

SoTA Model Benchmarks Reveal Evaluation Methodology Flaws

The evals that folks care about and publish at any time are those that SoTA models are just shy of being good at. So this is what such pics *always* look like.

→ View original post on X — @jeremyphoward,