AI Dynamics

Global AI News Aggregator

About

ProgramBench: A New Benchmark for Evaluating AI Agents in Software Development

Can AI build an entire software project from scratch, not just fix one bug? Researchers at Meta FAIR, Stanford, and Harvard introduce ProgramBench. This benchmark tests if language-model agents can take a program’s documentation and build a full codebase that behaves

→ View original post on X — @jiqizhixin