The irony. Using AI (GPT 5.5), errors have been identified that would affect 1/3 of the problems in the FrontierMath benchmark Tiers 1-4. This could represent a shift in the evaluations of AI's mathematical capabilities, which may be underestimated.
By
–

The irony. Using AI (GPT 5.5), errors have been identified that would affect 1/3 of the problems in the FrontierMath benchmark Tiers 1-4. This could represent a shift in the evaluations of AI's mathematical capabilities, which may be underestimated.