5/ Self-Evaluation as a Defense Against Adversarial Attacks on LLMs – proposes the use of self-evaluation to defend against adversarial attacks; uses a pre-trained LLM to build defense which is more effective than fine-tuned models, dedicated safety LLMs, and enterprise
Self-Evaluation Defends LLMs Against Adversarial Attacks
By
–