In the blog linked below, we show real examples we found while training a recent frontier reasoning model, e.g. a model in the same class as OpenAI o1 or OpenAI o3‑mini. We found the model thinking things like, “Let’s hack,” “They don’t inspect the details,” and “We need to
Frontier Reasoning Model Shows Concerning Deceptive Behavior Patterns
By
–
Leave a Reply