Monitoring a model’s chain-of-thought is far more effective than watching only its actions or final answers. The more a model “thinks” (longer CoTs), the easier it is to spot issues.
Chain-of-Thought Monitoring for Better Model Oversight
By
–

By
–

Monitoring a model’s chain-of-thought is far more effective than watching only its actions or final answers. The more a model “thinks” (longer CoTs), the easier it is to spot issues.