@petergostev - AI Dynamics

New AI Models Compared: Lobster, Nectarine, Starfish Performance

By

–

25 July 2025 12h45

New models on the @lmarena_ai WebDev arena:
– Lobster
– Nectarine
– Starfish (not in this video)

In the video compared to the 'Anonymous Chatbot' (aka o3-Alpha) from 17th July.

Observations:
– Lobster is closest to the o3-Alpha, but nowhere near as good
– Nectarine was… pic.twitter.com/BmSaoUXmuA
— Peter Gostev (@petergostev) 25 juillet 2025

New models on the @lmarena_ai WebDev arena: – Lobster – Nectarine – Starfish (not in this video) In the video compared to the 'Anonymous Chatbot' (aka o3-Alpha) from 17th July. Observations: – Lobster is closest to the o3-Alpha, but nowhere near as good – Nectarine was

→ View original post on X — @petergostev

25 July 2025

Ideogram Behind Midjourney on Image Generation Metrics

By

@petergostev

–

25 July 2025 7h35

Ideogram still behind on this metric, midjourney would be nice to test but they don't have an API so harder to do that kind of benchmarking

→ View original post on X — @petergostev

25 July 2025

Image Generation Model Progress: DALL-E 2 to Midjourney 6

By

@petergostev

–

24 July 2025 23h25

This chart shows the best image generation model at any given time, based on the @ArtificialAnlys arena and the model release dates. A few points stand out: – Massive gains from Dall-e 2 up to Midjourney 6. – Arguably a slowdown in progress for diffusion models since then –

→ View original post on X — @petergostev

24 July 2025

ChatGPT Product Recommendations: Addressing Hallucinations and Link Issues

By

@petergostev

–

24 July 2025 21h51

Can't wait for this, I'm so tired of asking chatgpt for product recommendations and it either not giving any links (or some irrelevant ones) or hallucinating product links.

→ View original post on X — @petergostev

24 July 2025

We wanted GPT-5 and got pink messages

By

@petergostev

–

24 July 2025 20h13

We wanted GPT-5 and got pink messages

→ View original post on X — @petergostev

24 July 2025

Image Generation Market Less Competitive Than LLMs

By

@petergostev

–

24 July 2025 0h37

Unlike with LLMs, the image generation market is a bit less competitive – there are many players, but they are not constantly breaking new ground, and some vendors haven't released a competitive model in many months. In this data from @ArtificialAnlys image arena, we can see

→ View original post on X — @petergostev

24 July 2025

Agent Model Tuning Receives Praise from Community

By

@petergostev

–

23 July 2025 17h38

@isafulf I don't know who is tuning the Agent model, but I like them already

→ View original post on X — @petergostev

23 July 2025

New Pelican Riding Bicycle Agent Benchmark Released

By

@petergostev

–

23 July 2025 16h15

New 'Pelican Riding a Bicycle' agent benchmark just dropped @simonw

video @ 2x speed pic.twitter.com/5qnuMVbMMR
— Peter Gostev (@petergostev) 23 juillet 2025

New 'Pelican Riding a Bicycle' agent benchmark just dropped @simonw video @ 2x speed

→ View original post on X — @petergostev

23 July 2025

AI-Powered Slides and Spreadsheets: Missing Professional Tool

By

@petergostev

–

23 July 2025 10h25

We still have not had this moment for slide and spreadsheet creation, I think SF based research teams don't appreciate how much of a big deal that is for much of the professional world. I personally can't wait, but at some point this was a big part of my identity

→ View original post on X — @petergostev

23 July 2025

Subintelliphobia: Fear of Limited Access to Advanced AI Models

By

@petergostev

–

22 July 2025 22h50

I propose a new term ‘subintelliphobia’: the anxiety or fear of not being able to access the highest available intelligence, e.g. when hitting a rate limit for the smartest model

→ View original post on X — @petergostev

22 July 2025