@petergostev - AI Dynamics

GPT-5.5-Pro vs Y axis: triangulating spend at $500k/week

By

–

27 June 2026 12h35

Next in the series of GPT-5.5-Pro vs the Y axis, where we try to triangulate information to add the missing axes. Explanation:
"I anchored the top of the chart at roughly $1M/week because that makes the latest spend about $500k/week, or ~$500 per employee per month for

→ View original post on X — @petergostev

27 June 2026

AI models are poor researchers compared to humans

By

@petergostev

–

27 June 2026 1h36

That's my general experience with the models right now, they are just not good researchers – they come up with bad ideas, calibrating them poorly, getting too bogged down in the details and not stepping back to see what they've learned. Humans are way better at this.

→ View original post on X — @petergostev

27 June 2026

US blocking model releases forces labs to keep models and integrate

By

@petergostev

–

26 June 2026 5h25

The logical (and unfortunate) consequence of the US government blocking model releases is that the labs would have to keep the best models to themselves and vertically integrate into intelligence heavy industries to generate revenue

→ View original post on X — @petergostev

26 June 2026

Trying to replace Fable with an ensemble of smaller models

By

@petergostev

–

25 June 2026 16h19

Me trying to replace Fable with an ensemble of smaller models pic.twitter.com/0dATTnoMbn
— Peter Gostev (@petergostev) 25 juin 2026

Me trying to replace Fable with an ensemble of smaller models

→ View original post on X — @petergostev

25 June 2026

Jalapeño chip tests with ML workloads including GPT-5.3-Codex-Spark

By

@petergostev

–

24 June 2026 15h27

Engineering samples of the Jalapeño chip run machine learning workloads in the lab at target production frequency and power, including GPT-5.3-Codex-Spark. I hope this is not the limit.

→ View original post on X — @petergostev

24 June 2026

LLMs will claim to equal Fable in a tedious period

By

@petergostev

–

22 June 2026 10h43

We are not entering a tedious period where many LLMs will claim to equal Fable.

→ View original post on X — @petergostev

22 June 2026

Periodic reminder to clear your agents.md and claude.md

By

@petergostev

–

19 June 2026 16h17

A periodic reminder to clear your agents.md & claude.md – if you're like me, they're probably filled with useless tips and tricks from months ago that confuse the latest models

→ View original post on X — @petergostev

19 June 2026

GLM-5.2 performs poorly on Bullshit Benchmark, like other Series 5 models

By

@petergostev

–

19 June 2026 11h34

GLM-5.2 did not perform very well on the Bullshit Benchmark – a level similar to that of their other Series 5* models.

→ View original post on X — @petergostev

19 June 2026

Separate discussion thread for managing long-term goals

By

@petergostev

–

18 June 2026 13h04

My best advice for long-term goals is to have a separate discussion thread (or /side) where you can ask the agent to review the work on your goal and debate how to update the guidelines for it so it doesn't derail.

→ View original post on X — @petergostev

18 June 2026

AI as atomic weapons, unpatchable jailbreaks lead to this outcome

By

@petergostev

–

18 June 2026 1h14

To be fair, if one was to repeatedly describe AI as equivalent to atomic weapons, then saying that patching all jailbreaks is impossible, I totally get how we'd end up here

→ View original post on X — @petergostev

18 June 2026