AI Dynamics

Global AI News Aggregator

About

Monosemantic Features Steer Transformer Model Outputs

Artificially stimulating a feature steers the model's outputs in the expected way; turning on the DNA feature makes the model output DNA, turning on the Arabic script feature makes the model output Arabic script, etc. https://
transformer-circuits.pub/2023/monoseman
tic-features/index.html

→ View original post on X — @anthropicai