Must Know Benchmarks and Evals: Knowledge: @hendrycks
' MMLU and MATH, @idavidrein
's GPQA and BIG-Bench and their polyunsaturated 2025 variants. Ditto Math lvl 5, AIME, @tamaybes
's FrontierMath, etc Long Context: @ZayneSprague
's MuSR, @realYushiBai
's LongBench,
Essential AI Benchmarks and Evaluation Metrics for LLMs
By
–
Leave a Reply