Our third agent was developed for the Claude 4 alignment assessment. It red-teams LLMs for concerning behaviors by having hundreds of probing conversations in parallel. We find the agent uncovers 7/10 behaviors implanted into test models.
Claude 4 Agent Detects 7 of 10 Implanted Concerning Behaviors
By
–
