New Anthropic Research: next generation Constitutional Classifiers to protect against jailbreaks. We used novel methods, including practical application of our interpretability work, to make jailbreak protection more effective—and less costly—than ever.
Anthropic’s Constitutional Classifiers Advance Jailbreak Protection
By
–