AI Dynamics

Global AI News Aggregator

About

Claude Develops Constitutional Classifiers Against LLM Jailbreaks

Like all LLMs, Claude is vulnerable to jailbreaks—inputs designed to bypass its safety training and force it to produce outputs that might be harmful. Our new technique is a step towards robust jailbreak defenses. Read the blog post: https://
anthropic.com/research/const
itutional-classifiers

→ View original post on X — @anthropicai