We're thrilled to introduce PrediBench, our first production at @presage_labs
! PrediBench a live benchmark that answers the question "could an AI model earn money on Polymarket?" TL;DR: Some models like Grok-4 or GPT-5 do beat the crowd of human betters, and they turn a profit!
@aymericroucher
-

PrediBench: AI Models Profiting on Polymarket Benchmark
By
–
-
Discussion on Challenges and Fundamentals of AI Agents at Hugging Face
By
–
Nice to see that progress goes on on agents at Hugging Face! Computer-use agents are notoriously hard to build, as seen by the complete lack of real-world applications of Operator / ChatGPT Agent.
Turns out the fundamentals matter:
– a simple, adjusted action space
– Reasoning https://
x.com/amir_mahla/sta
/amir_mahla/status/1970488574140407963
… -
Discussion on Agent Autonomy and Task Solving Time
By
–
Time-horizon autonomy definition does not take into account imo that task solving time for a human is non-linear.
> Really second that ! That can be a strong advantage of agents, especially if the task is parallelizable +1 also on the UX side : it's hard to get the thing right. -
AI Agents’ Limitations in Solving Hard Tasks
By
–
1- running agents for longer dones not increase success rate to 100% : a dumb model will never solve hard tasks no matter how long it runs
(Same as in an IQ test actually, to use your analogy: spending even days on a task you don't understand won't help you solve it)
2- running -

Doubling AI Task Autonomy and LLM Training Challenges
By
–
According to METR, the length of tasks that AI can solve ("time-horizon of autonomy") is doubling every 7 months But that leaves many questions unanswered ▸ What abilities in LLMs did it take to increase autonomy that much?
▸ How will we train LLMs to keep this progress -

OpenAI’s Projected Training Costs and Scaling Forecasts
By
–
OpenAI's own projections for burn shocked many :
– $35 billion in 2027
– $45 billion in 2028
(most will go into model training) Yet even numbers were already forecasted one year ago by @leopoldasch in Situational Awareness (table below) Keep calm and scale on. -
LLMs and the impact on entry-level job tasks
By
–
There could be other effects! For me it's quite intuitive that entry-level jobs are more affected, because it's the jobs whose tasks are generally lower-level, so more accessible to LLMs
-
Analyzing the Current Economic Impact of AI Infrastructure
By
–
Spot on article by @DKThom
:
"All this talk about AI as the technology of the future—will it cure cancer in 2030? or, destroy the world in 2027? or accomplish both, maybe within the same month?—can evade the question of what AI is doing to the economy right now. AI infrastructure -

Optimizing AI Coding Workflows with System Instructions
By
–
Daily reminder : ~/.codex/instructions.md or ~/.claude/CLAUDE.md make a huge difference! My instructions:
– Don't catch errors, I prefer to raise them and fix them myself
– Go simple
– For GPT-5 in codex: don't abbreviate names (model has a tendency to make confuse