I find it quite curious that LLMs gravitate towards laziness. As far as I know, reasoning LLMs are not penalised for using more compute, nor is o3 here actually using less compute when it takes five turns to deliberate whether it should do the task properly. Maybe training on
@petergostev
-

Video Generation AI Models: Hailuo 2 and Seedance 1 Challenge Veo 3
By
–
The video generation market is heating up: recent models like Hailuo 2 and Seedance 1 offer big quality gains at ~3-4x cheaper than Veo 3. Veo 3's unique native audio capability might set a new standard if others follow. Elo Data from @ArtificialAnlys Pricing from @FAL and
-
Video Generation Models Compared: Top AI Tools Ranked
By
–
Comparing video generation models with dog gymnastics@Hailuo_AI Hailuo 2@BytedanceTalk Seedance 1@Kling_ai Kling 2.1@GoogleDeepMind Veo 3@runwayml Gen 4@OpenAI Sora pic.twitter.com/XVyK8JkevM
— Peter Gostev (@petergostev) 24 juin 2025Comparing video generation models with dog gymnastics @Hailuo_AI Hailuo 2 @BytedanceTalk Seedance 1 @Kling_ai Kling 2.1 @GoogleDeepMind Veo 3 @runwayml Gen 4 @OpenAI Sora
-
H100 Prices Expected to Fall to $1-1.50 Mark
By
–
I expect H100 prices will keep falling, probably to around the ~$1.50 mark or maybe even ~$1 at some point. This would be a major boost for open source models and smaller use cases, something that tends to be squeezed out when GPUs are rare and expensive. Data from Bloomberg's
-

GPU Rental Prices Signal DeepSeek Competition and Market Shifts
By
–
GPU rental prices are a handy indicator of what's going on in the AI market. Is DeepSeek driving GPU supply as companies want to run it themselves? Is competition from AWS pushing prices down? We've seen a steep drop in H100 prices as the more capable B200 GPUs begin rolling out.
-
Claude 4 Gains Market Share Against Google and OpenAI o3
By
–
It is clear that @AnthropicAI Claude 4 (mostly Sonnet) continues to regain share from Google and other vendors. Strangely, we have not yet seen @OpenAI
's o3 appear in the Top 20 rankings, despite the 80% cost reduction and anecdotal evidence suggesting increased use for coding. -

OpenRouter Programming Prompts Usage Statistics Analysis
By
–
@openrouter publishes usage statistics, and while they are not representative, they are useful for seeing how usage changes over time. In this chart, we're looking at 'Programming' prompts via OpenRouter.
-

Waymo Discovers Scaling Laws Apply to Autonomous Driving Models
By
–
Waymo released an interesting paper stating that autonomous driving is also susceptible to scaling laws: the more compute, the better the performance. The cool thing is we now have at least two clear domains where this idea works: language models and self-driving models.
-
Trade-offs of Large Language Models in AI Training
By
–
An interesting point is that using very large base models like GPT-4.5 involves trade offs. Larger, slower, and more expensive models mean longer training loops, higher compute usage, and thus a reduced overall learning rate.
-
O4 Model Performance Improvements Through Reinforcement Learning
By
–
With this stronger base, RL should see improved performance, meaning o4 will likely saturate GPQA and probably many other benchmarks that are susceptible to RL.
