First, here are the benchmarks. Humanity’s Last Exam: 44.4% (VS 26.9% Gemini 2.5 pro )
ARC-AGI-2: 15.9% (VS 8.6% from Claude Sonnett 4)
LiveCodeBench: 79.4% (VS 75.8% from Gemini 2.5 pro) Grok 4 is SOTA
Grok 4 Achieves SOTA Performance on Major AI Benchmarks
By
–
Leave a Reply