AI Dynamics

Global AI News Aggregator

About

PaperBench: AI Agents Replicating State-of-the-Art Research

We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part of our Preparedness Framework. Agents must replicate top ICML 2024 papers, including understanding the paper, writing code, and executing experiments.

→ View original post on X — @openai