@aymericroucher - AI Dynamics

PrediBench: AI Models Profiting on Polymarket Benchmark

By

–

24 September 2025 18h41

We're thrilled to introduce PrediBench, our first production at @presage_labs
! PrediBench a live benchmark that answers the question "could an AI model earn money on Polymarket?" TL;DR: Some models like Grok-4 or GPT-5 do beat the crowd of human betters, and they turn a profit!

→ View original post on X — @aymericroucher

24 September 2025

Discussion on Challenges and Fundamentals of AI Agents at Hugging Face

By

@aymericroucher

–

23 September 2025 16h16

Nice to see that progress goes on on agents at Hugging Face! Computer-use agents are notoriously hard to build, as seen by the complete lack of real-world applications of Operator / ChatGPT Agent.
Turns out the fundamentals matter:
– a simple, adjusted action space
– Reasoning https://
x.com/amir_mahla/sta
/amir_mahla/status/1970488574140407963
…

→ View original post on X — @aymericroucher

23 September 2025

Discussion on Agent Autonomy and Task Solving Time

By

@aymericroucher

–

11 September 2025 22h15

Time-horizon autonomy definition does not take into account imo that task solving time for a human is non-linear.
> Really second that ! That can be a strong advantage of agents, especially if the task is parallelizable +1 also on the UX side : it's hard to get the thing right.

→ View original post on X — @aymericroucher

11 September 2025

AI Agents’ Limitations in Solving Hard Tasks

By

@aymericroucher

–

11 September 2025 21h07

1- running agents for longer dones not increase success rate to 100% : a dumb model will never solve hard tasks no matter how long it runs
(Same as in an IQ test actually, to use your analogy: spending even days on a task you don't understand won't help you solve it)
2- running

→ View original post on X — @aymericroucher

11 September 2025

Doubling AI Task Autonomy and LLM Training Challenges

By

@aymericroucher

–

11 September 2025 15h45

According to METR, the length of tasks that AI can solve ("time-horizon of autonomy") is doubling every 7 months But that leaves many questions unanswered ▸ What abilities in LLMs did it take to increase autonomy that much?
▸ How will we train LLMs to keep this progress

→ View original post on X — @aymericroucher

11 September 2025

OpenAI’s Projected Training Costs and Scaling Forecasts

By

@aymericroucher

–

07 September 2025 16h53

OpenAI's own projections for burn shocked many :
– $35 billion in 2027
– $45 billion in 2028
(most will go into model training) Yet even numbers were already forecasted one year ago by @leopoldasch in Situational Awareness (table below) Keep calm and scale on.

→ View original post on X — @aymericroucher

7 September 2025

LLMs and the impact on entry-level job tasks

By

@aymericroucher

–

04 September 2025 16h49

There could be other effects! For me it's quite intuitive that entry-level jobs are more affected, because it's the jobs whose tasks are generally lower-level, so more accessible to LLMs

→ View original post on X — @aymericroucher

4 September 2025

Analyzing the Current Economic Impact of AI Infrastructure

By

@aymericroucher

–

04 September 2025 16h16

Spot on article by @DKThom
:
"All this talk about AI as the technology of the future—will it cure cancer in 2030? or, destroy the world in 2027? or accomplish both, maybe within the same month?—can evade the question of what AI is doing to the economy right now. AI infrastructure

→ View original post on X — @aymericroucher

4 September 2025

Optimizing AI Coding Workflows with System Instructions

By

@aymericroucher

–

03 September 2025 11h58

Daily reminder : ~/.codex/instructions.md or ~/.claude/CLAUDE.md make a huge difference! My instructions:
– Don't catch errors, I prefer to raise them and fix them myself
– Go simple
– For GPT-5 in codex: don't abbreviate names (model has a tendency to make confuse

→ View original post on X — @aymericroucher

3 September 2025