AI Dynamics

Global AI News Aggregator

About

Larger Models Better Preserve Backdoors Despite Safety Training

Larger models were better able to preserve their backdoors despite safety training. Moreover, teaching our models to reason about deceiving the training process via chain-of-thought helped them preserve their backdoors, even when the chain-of-thought was distilled away.

→ View original post on X — @anthropicai