AI Dynamics

Global AI News Aggregator

AI Agents Managing Startups: YC-Bench Tests Profitability and Survival

Can an AI agent run a startup for a year without going bankrupt? Turns out most can't. New benchmark from Collinear AI puts 12 models to the test. YC-Bench tasks agents with running a simulated startup over hundreds of turns: hiring employees, selecting contracts, and maintaining profitability in a partially observable environment with adversarial clients and compounding consequences. Only three models consistently surpass the $200K starting capital. Claude Opus 4.6 leads at $1.27M average final funds, followed by GLM-5 at $1.21M with 11x lower inference cost. Scratchpad usage, the sole mechanism for persisting information across context truncation, is the strongest predictor of success. Adversarial client detection accounts for 47% of bankruptcies. Long-horizon coherence, not raw intelligence, separates the winners from the bankrupt. Paper: arxiv.org/abs/2604.01212 Learn to build effective AI agents in our academy: academy.dair.ai/

→ View original post on X — @dair_ai, 2026-04-02 15:37 UTC

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *