AI Dynamics

Global AI News Aggregator

About

Dictionary Learning Makes LLM Neurons More Interpretable

The problem: most LLM neurons are uninterpretable, stopping us from mechanistically understanding the models. In October, we showed that dictionary learning could decompose a small model into "monosemantic" components we call "features"—making the model more interpretable.

→ View original post on X — @anthropicai