AI Dynamics

Global AI News Aggregator

About

Qwen2.5 and DeepSeek-V2.5 Performance Comparison on MMLU

Updated version of the graph with @Alibaba_Qwen
's Qwen2.5-72B and @deepseek_ai
's DeepSeek-V2.5. I didn't add o1 because these models have built-in CoT reasoning and maybe other stuff that makes the comparison unfair. For example, Gemini Ultra's MMLU score goes from 83.7% with

→ View original post on X — @maximelabonne,