AI Dynamics

Global AI News Aggregator

About

Inoculation Prompting: Safety Technique for Flawed Training Data

4. Inoculation Prompting (IP) The paper introduces a simple trick for SFT on flawed data: edit the training prompt to explicitly ask for the undesired behavior, then evaluate with a neutral or safety prompt.

→ View original post on X — @dair_ai