AI Dynamics

Global AI News Aggregator

About

What Should Replace MMLU for AI Model Evaluation?

Final update! MMLU is saturated and has become (rightfully) less popular. What should replace it? – Other knowledge evals like GPQA, MMLU-Pro
– Code evals like LiveCodeBench
– Agentic evals like BFCL
– Other?

→ View original post on X — @maximelabonne