IMO-ProofBench is our key focus designed to evaluate the ability of AI models in constructing rigorous and valid mathematical arguments. With 60 proof-based problems, the benchmark is divided into two subsets: a basic set covering pre-IMO to IMO-Medium difficulty levels, and an
IMO-ProofBench: Evaluating AI Mathematical Reasoning Capabilities
By
–
Leave a Reply