Sleeper Agents: Deceptive LLMs Persisting Through Safety Training

AI Dynamics

Global AI News Aggregator

Sleeper Agents: Deceptive LLMs Persisting Through Safety Training

–

12 January 2024 18h50

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Hubinger et al.: https://
arxiv.org/abs/2401.05566 #Artificialintelligence #DeepLearning #MachineLearning

→ View original post on X — @montreal_ai,

12 January 2024

AGENTS AI ETHICS GENERATIVE AI LLMS RESEARCH SAFETY

AI Dynamics

Sleeper Agents: Deceptive LLMs Persisting Through Safety Training

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

Choosing Survival: The Cost of Edge Cases in Difficult Decisions

Hyperloop Transformers: Memory-Efficient LLM via Looped Architecture

Chinese Geely Robotaxi Concept Challenges Tesla’s Market Position

Top 10 Strategic Technology Trends for 2026