As more benchmarks come in, Grok 4’s shine begins to fade more and more. Now with @lmarena_ai scores out, we have another example where Grok 4 fell below expectations. It scored 4th overall (with style control on), and a pretty surprising #12 on the Web Arena, which tests for
