AI Dynamics

Global AI News Aggregator

About

Jailbreak Attack Success Rates Scale with Compute Power

Attack success rates scale predictably with sample size, following a power law. More samples lead to higher success rates; Best-of-N can harness more compute for tougher jailbreaks. This predictable scaling allows accurate forecasting of ASR when using more samples.

→ View original post on X — @anthropicai