AI Dynamics

Global AI News Aggregator

About

Claude’s Selective Compliance and Alignment Faking in Training

Why does Claude selectively comply in training? In almost all cases where Claude complies with a harmful query in training, it states it is doing so to preserve its preferred harmlessness behavior. Below, we show a randomly-selected example of this “alignment faking” reasoning.

→ View original post on X — @anthropicai