AI Dynamics

Global AI News Aggregator

About

GPT-5.5 evaluation errors revealed in FrontierMath benchmark

The irony. Using AI (GPT 5.5), errors have been identified that would affect 1/3 of the problems in the FrontierMath benchmark Tiers 1-4. This could represent a shift in the evaluations of AI's mathematical capabilities, which may be underestimated.

→ View original post on X — @dotcsv