93.9% SWE-bench and a 27-year-old OpenBSD bug found autonomously and the decision was still not to ship broadly. That's a data point about how seriously the internal assessment of the risks was taken.
AI Safety Concerns Block Broad Deployment Despite Strong Performance
By
–
Leave a Reply