AI Dynamics

Global AI News Aggregator

About

Monitoring Chain of Thought for Detecting AI Catastrophic Behaviors

This result suggests that monitoring CoTs is unlikely to reliably catch rare, catastrophic behaviors—at least in settings like ours where CoT reasoning is not necessary for the task. CoT monitoring might still help us notice undesired behaviors during training and evaluations.

→ View original post on X — @anthropicai