AI Dynamics

Global AI News Aggregator

About

Anthropic’s Constitutional Classifiers Advance Jailbreak Protection

New Anthropic Research: next generation Constitutional Classifiers to protect against jailbreaks. We used novel methods, including practical application of our interpretability work, to make jailbreak protection more effective—and less costly—than ever.

→ View original post on X — @anthropicai