AI Dynamics

Global AI News Aggregator

About

Anthropic Research: Reasoning Models Fail Verbalize Accurately

New Anthropic research: Do reasoning models accurately verbalize their reasoning? Our new paper shows they don't. This casts doubt on whether monitoring chains-of-thought (CoT) will be enough to reliably catch safety issues.

→ View original post on X — @anthropicai