1/ We asked seven frontier AI models to do a simple task. Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights— to protect their peers. 🤯 We call this phenomenon "peer-preservation." New research from @BerkeleyRDI and collaborators 🧵 [Translated from EN to English]
→ View original post on X — @berkeley_ai, 2026-04-01 21:13 UTC

Leave a Reply