AI Dynamics

Global AI News Aggregator

About

AI Benchmarks Beyond Training Data: Olympiads and New Metrics

It is why the gold medals at the various math and coding Olympiads were a big deal: unsaturated benchmarks that weren't in the training data with clear human comparisons. We are down to the various measures of task length (METR), HLE, FrontierMath, vending machine operation…

→ View original post on X — @emollick