Like all LLMs, Claude is vulnerable to jailbreaks—inputs designed to bypass its safety training and force it to produce outputs that might be harmful. Our new technique is a step towards robust jailbreak defenses. Read the blog post: https://
anthropic.com/research/const
itutional-classifiers
…
Claude Develops Constitutional Classifiers Against LLM Jailbreaks
By
–
