Sharing some of the work I've been doing at OpenAI: we now monitor 99.9% of internal coding traffic for misalignment using our most powerful models, reviewing full trajectories to catch suspicious behavior, escalate serious cases quickly, and strengthen our safeguards over time. [Translated from EN to English]
OpenAI Monitors 99.9% of Internal Traffic to Detect Anomalies
By
–
