Attack success rates scale predictably with sample size, following a power law. More samples lead to higher success rates; Best-of-N can harness more compute for tougher jailbreaks. This predictable scaling allows accurate forecasting of ASR when using more samples.
Jailbreak Attack Success Rates Scale with Compute Power
By
–
