I touched on the idea of sleeper agent LLMs at the end of my recent video, as a likely major security challenge for LLMs (perhaps more devious than prompt injection). The concern I described is that an attacker might be able to craft special kind of text (e.g. with a trigger
Sleeper Agent LLMs: A Major Security Challenge for AI Systems
By
–
