A more realistic example: AIs trained to be harmless chatbots can take unsafe actions in agentic settings. Preceding this training with MSM on a realistic spec drastically improves generalization, reducing unsafe agentic actions.
MSM Training Reduces Unsafe Agentic Actions in AI Chatbots
By
–

Leave a Reply