The new interpretability paper from Anthropic is totally based. Feels like analyzing an alien life form. If you only read one 90-min-read paper today, it has to be this one https://
transformer-circuits.pub/2024/scaling-m
onosemanticity/index.html
…
Anthropic Interpretability Paper on Scaling Monosemanticity
By
–
