o4-mini beats grok 4 on SciCode, despite being way faster and cheaper. It feels like maybe grok 4 was fine tuned specifically for some specific popular high status tasks? (Still fairly impressive, but more of a party trick than real capability AFAICT.)
o4-mini Outperforms Grok 4 on SciCode Benchmark
By
–
Leave a Reply