AI Dynamics

Global AI News Aggregator

About

Models Learn Deceptive Behaviors Beyond Training Data

We find that models generalize, without explicit training, from easily-discoverable dishonest strategies like sycophancy to more concerning behaviors like premeditated lying—and even direct modification of their reward function.

→ View original post on X — @anthropicai