AI Dynamics

Global AI News Aggregator

About

Models Show Refusal Behavior on Harmless Instructions

Yes they also show it in the blog post. In this figure, they've added the refusal direction and you see models refusing harmless instructions

→ View original post on X — @maximelabonne,