New work on evaluating the quality of chain-of-thought monitorability. Chain-of-thought monitorability is a very encouraging opportunity for safety and alignment, making it easy to see what models are thinking:
By
–
New work on evaluating the quality of chain-of-thought monitorability. Chain-of-thought monitorability is a very encouraging opportunity for safety and alignment, making it easy to see what models are thinking: