Sonnet 4.5 Achieves 82% on SWE-bench with Test-Time Compute

AI Dynamics

Global AI News Aggregator

Sonnet 4.5 Achieves 82% on SWE-bench with Test-Time Compute

–

29 September 2025 18h56

Sonnet 4.5 hits 82% on SWE-bench with test-time compute. We're adding another dot to what's been a pretty clean exponential toward saturating SWE-bench as a benchmark. At this rate we'll need new evals soon, but it's still a useful signal that the capability curve is holding.

→ View original post on X — @alexalbert__,

29 September 2025

AI CODE GENERATIVE AI INNOVATION LLMS MACHINE LEARNING RESEARCH

AI Dynamics

Sonnet 4.5 Achieves 82% on SWE-bench with Test-Time Compute

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

AI Generates Perfect Jokes Using Image Generation Skills

Codex App Transformation: Atlas Integration Reshapes User Experience

AI File Access Limitations: Screenshot vs Disk Storage Issues

Synthetic Aperture Radar: Satellite Tech for Global Monitoring