AI Dynamics

Global AI News Aggregator

About

LLM Alignment Survey: Comprehensive Review of Safety and Interpretability

7/ LLM Alignment Survey – a comprehensive survey paper on LLM alignment; topics include Outer Alignment, Inner Alignment, Mechanistic Interpretability, Attacks on Aligned LLMs, Alignment Evaluation, Future Directions, and Discussions.

→ View original post on X — @dair_ai