AI Dynamics

Global AI News Aggregator

Sonnet 4.5 Achieves 82% on SWE-bench with Test-Time Compute

Sonnet 4.5 hits 82% on SWE-bench with test-time compute. We're adding another dot to what's been a pretty clean exponential toward saturating SWE-bench as a benchmark. At this rate we'll need new evals soon, but it's still a useful signal that the capability curve is holding.

→ View original post on X — @alexalbert__,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *