It is tagged as o3-alpha on the arena
@petergostev
-
FeatureCrewPod Channel Tests AI Models on Coding Tasks
By
–
You should subscribe to @FeatureCrewPod YouTube Channel, they are doing great work testing models on a bunch of different coding & analytical tasks https://
youtube.com/@FeatureCrewPod -
Web Dev Arena Battle Mode: Testing Your Development Skills
By
–
It would be in the Web Dev Arena https://
webdev.lmarena.ai and it would be in the battle mode, so you need to get lucky -
GPT-4.5 and Suno Integration Sparks Creative Interest
By
–
Haha, I kind of like it though, gpt-4.5 + Suno
-
Large Model with Extended Thinking Time Discussion
By
–
Yeah can't imagine this is a small model, thinking time was quite long
-
OpenAI Tests Anonymous Chatbot 0717 Web Development Model
By
–
OpenAI are testing a new model on the Web Dev Arena @lmarena_ai under the name 'Anonymous Chatbot 0717'. I can't believe I'm gonna say this, but it is genuinely at a completely different level of front end coding – far better than Sonnet, o3, Gemini 2.5 Pro, or Grok 4.
— Peter Gostev (@petergostev) 18 juillet 2025
To test… pic.twitter.com/wQKMgPRFGFOpenAI are testing a new model on the Web Dev Arena @lmarena_ai under the name 'Anonymous Chatbot 0717'. I can't believe I'm gonna say this, but it is genuinely at a completely different level of front end coding – far better than Sonnet, o3, Gemini 2.5 Pro, or Grok 4. To test
-
OpenAI’s New Agent Impresses with UI and Long-term Coherence
By
–
My favourite thing about @OpenAI new agent might be superficial, but I love the UI – it shows exactly what it is doing without overloading you with information, and it does so in a smooth and beautiful way.
— Peter Gostev (@petergostev) 17 juillet 2025
Apart from the UI, it is also impressively coherent over a long period… pic.twitter.com/y7ys0iyrNsMy favourite thing about @OpenAI new agent might be superficial, but I love the UI – it shows exactly what it is doing without overloading you with information, and it does so in a smooth and beautiful way. Apart from the UI, it is also impressively coherent over a long period
-
Surprise soft launch of new five-dollar product announced
By
–
Can't believe they soft launched a fiver like this
-

Grok 4 Performance Disappoints in Latest AI Benchmarks
By
–
As more benchmarks come in, Grok 4’s shine begins to fade more and more. Now with @lmarena_ai scores out, we have another example where Grok 4 fell below expectations. It scored 4th overall (with style control on), and a pretty surprising #12 on the Web Arena, which tests for

