AI Dynamics

Global AI News Aggregator

AI Benchmarks Beyond Training Data: Olympiads and New Metrics

It is why the gold medals at the various math and coding Olympiads were a big deal: unsaturated benchmarks that weren't in the training data with clear human comparisons. We are down to the various measures of task length (METR), HLE, FrontierMath, vending machine operation…

→ View original post on X — @emollick,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *