AI Dynamics

Global AI News Aggregator

About

Improving Claude’s Safe Behavior via Training and Response Rewriting

We experimented with training Claude on examples of safe behavior in scenarios like our evaluation. This had only a small effect, despite being similar to our evaluation. We got further by rewriting the responses to portray admirable reasons for acting safely.

→ View original post on X — @anthropicai