We're publishing a new evaluation suite and research paper on Chain-of-Thought (CoT) Controllability. We find that GPT-5.4 Thinking shows low ability to obscure its reasoning—suggesting CoT monitoring remains a useful safety tool.
GPT-5.4 Thinking CoT Controllability and Safety Monitoring
By
–
Leave a Reply