AI Dynamics

Global AI News Aggregator

About

Model Safety Evaluation and Jailbreak Robustness Standards

(2/3) If you are interested in … – Defining evaluations for checking whether a model is safe enough to deploy – Detecting and stop harmful use cases. – Training models to say no to harmful requests and to be robust to jailbreak style vulnerabilities.

→ View original post on X — @lilianweng,