We view chain-of-thought monitoring as complementary to mechanistic interpretability, not as a replacement for it. Because we believe that chain-of-thought monitoring is incredibly useful as a window into a model’s brain and could be a loadbearing layer in a scalable control
Chain-of-thought monitoring as complementary mechanistic interpretability approach
By
–