Universal and Transferable Adversarial Attacks on Aligned Language Models by @andyzou_jiaming et al. is wild. “Adversarial prompts generated by our approach are quite transferable, including to black-box, publicly released LLMs.” https://
arxiv.org/abs/2307.15043 https://
github.com/llm-attacks/ll
m-attacks
…
Universal Adversarial Attacks on Aligned Language Models Study
By
–
Leave a Reply