AI Dynamics

Global AI News Aggregator

About

Datacurve audit: contamination undermines SWE-Bench Pro

Datacurve's audit found three structural problems with SWE-Bench Pro. First, contamination. The tasks come from public GitHub commits. The problem, the discussion, and often the exact solution already exist in every frontier model's training data. No way to tell if a model is

→ View original post on X — @godofprompt,