@dotcsv - AI Dynamics - Page 7 of 156

A false narrative is being shared that GPT 5.5 ties with Mythos on several benchmarks as if that made them equivalent. What's being overlooked is that GPT 5.4 was already on par with Mythos in those benchmarks, except for Terminal Bench 2.0…

→ View original post on X — @dotcsv

24 April 2026

Testing AI Model Performance Beyond Benchmarks Real Evaluation

By

@dotcsv

–

23 April 2026 21h06

Now it's time to test the model to draw clear conclusions not just from benchmarks, but from the real vibes of working with it. And also wait for external evaluations on the rest of the benchmarks from those who go on testing via the API. We'll keep you posted

→ View original post on X — @dotcsv

23 April 2026

GPT 5.5 Frontend Design Issues Persist Despite Updates

By

@dotcsv

–

23 April 2026 21h04

En mi carta a Santa Claus puse que ojalá GPT 5.5 arreglara por fin el problema de diseño de frontends, pero claro… en Abril no viene Santa Claus.

Noto ligeras mejoras pero ahí están, las cajas azules rodeándolo todo.https://t.co/BMoRdNcIxD
— Carlos Santana (@DotCSV) 23 avril 2026

In my letter to Santa Claus, I wrote that I hoped GPT 5.5 would finally fix the frontend design problem, but of course… Santa Claus doesn't come in April. I notice slight improvements but there they are, the blue boxes surrounding everything.

→ View original post on X — @dotcsv

23 April 2026

ARC-AGI 5.5 outperforms Gemini 3.1 Pro at reasoning

By

@dotcsv

–

23 April 2026 21h00

In ARC-AGI 5.5, it is positioned at the cost frontier of Gemini 3.1 Pro but achieving a better score in its highest reasoning levels.

→ View original post on X — @dotcsv

23 April 2026

OpenAI’s Rapid GPT Release Pace Raises Marketing Concerns

By

@dotcsv

–

23 April 2026 20h56

I'm sure that GPT 5.5 by vibes will be an excellent model, just like its predecessor was. And hey, OpenAI's pace of progress is excellent if we think about the fact that the previous one came out a month and a half ago! BUT FOR GOODNESS' SAKE OPENAI STOP WITH THE OVER-THE-TOP

→ View original post on X — @dotcsv

23 April 2026

New Model Costs 20% More Than GPT 5.4 Despite Token Efficiency

By

@dotcsv

–

23 April 2026 20h51

Because the price of the model doubles (!!) its predecessor, buuuut this is something that's mitigated by the more efficient use of tokens But in combination, as a result, it leaves us with a model that's 20% more expensive than GPT 5.4 xhigh when it comes to evaluating

→ View original post on X — @dotcsv

23 April 2026

New AI Model Shows Better Benchmarks Despite Higher Costs

By

@dotcsv

–

23 April 2026 20h44

What is clear is that it is a better model in terms of benchmarks and based on the feedback I've been seeing for days from insiders I trust who have been testing it. Bring it on. That said, it is also a more expensive model.

→ View original post on X — @dotcsv

23 April 2026

GPT Model Performance Benchmarking and Marginal Improvements Analysis

By

@dotcsv

–

23 April 2026 20h36

A benchmark that captures the evolution of models in real office work well is OpenAI's own GDPval, and notice that here the model improves only marginally over the previous one, but the thing is that it really even in WINS achieves less than GPT 5.4. Honestly, I think there are

→ View original post on X — @dotcsv

23 April 2026