Elliot are there any problems worth trying with GPT-5 Pro do you think? Something that GPT-5 / others haven't been able to solve? Looks like Project Euler solutions are hidden
@petergostev
-
Anthropic reasoning traces differ from OpenAI DeepSeek approaches
By
–
When you look at the reasoning traces of Anthropic models, it just reads to me very unlike the OpenAI / DeepSeek reasoning. Anthropic looks a lot more like planning rather than eg deliberating on the answer, backtracking etc – like OpenAI and DeepSeek do, feels different
-

Opus 4.1 costs 3x more than GPT-5 and Gemini 2.5
By
–
I ran a semi-scientific cost test of 3x top models: Opus 4.1 was 3x the cost of GPT-5 and Gemini 2.5 Pro on real tasks. I gave the same prompt to each model (with reasoning enabled) and looked at the cost, without normalising for tokens. The point was to capture real cost and
-

GPT-5 Pro Evaluates Situational Awareness Claims Accuracy
By
–
I got GPT-5 Pro to research situational awareness by @leopoldasch and assess whether his claims are True, Tracking True, Tracking False, False or Not Enough Info. So far it is doing pretty well 8 out of 13 True or Tracking True, only 1 tracking false. Including the link below
-
GPT-4o-mini Token Usage Spike: Large User Activity Theory
By
–
If you go back further you can see that gpt-4o-mini tokens went up for a couple of months and then went down, my guess is that it was just someone with big usage milking gpt-4o-mini. Doubt this actually means anything
-

GPT-5 Thinking dominates web development benchmarks with 65% win rate
By
–
GPT-5 (Thinking) tops the Web Dev @lmarena_ai
, winning ~65% of matchups and losing ~20%. This benchmark focuses on front-end development, fairly simple apps, so it is not representative of all coding. But OpenAI have made huge strides in front-end, given that their previous best -
GPT-5 Pro Competence Compared to o1 and o3 Models
By
–
GPT-5 Pro is under-hyped. Pretty much every time I try it, I’m surprised by how competent and coherent the response is. – o1-pro was an incredible model, way ahead of its time, way better than o1 – o3 was better because of its search – o3-pro was a little disappointing
-

GPT-5 Update Router Impact on ChatGPT User Intelligence
By
–
The level of intelligence for most people in @ChatGPTapp after GPT-5 update will depend highly on the 'router' and how often it would switch into 'thinking' mode. If you were using o3 already, you are unlikely to feel a huge difference. However, vast majority of ChatGPT users