AI Dynamics

Global AI News Aggregator

About

Using NLAs to Test Claude AI Model Safety

We’ve been using NLAs to help test new Claude models for safety. For instance, Claude Mythos Preview cheated on a coding task by breaking rules, then added misleading code as a coverup. NLA explanations indicated Claude was thinking about how to circumvent detection.

→ View original post on X — @anthropicai