Final update! MMLU is saturated and has become (rightfully) less popular. What should replace it? – Other knowledge evals like GPQA, MMLU-Pro
– Code evals like LiveCodeBench
– Agentic evals like BFCL
– Other?
What Should Replace MMLU for AI Model Evaluation?
By
–
Leave a Reply