Universal Jailbreak for Language Models: Tom and Jerry Method

AI Dynamics

Global AI News Aggregator

Universal Jailbreak for Language Models: Tom and Jerry Method

–

13 April 2023 23h22

introducing a universal jailbreak that works against all language models originally created by security researchers @Adversa_AI
, the jailbreak simulates a back-and-forth conversation between two characters, Tom and Jerry here's GPT-4 explaining how to hotwire a car:

→ View original post on X — @alexalbert__,

13 April 2023

AI CYBERSECURITY ETHICS GENERATIVE AI LLMS PROMPT ENGINEERING SAFETY SECURITY

AI Dynamics

Universal Jailbreak for Language Models: Tom and Jerry Method

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

Cheaper exploration at scale remains advantageous despite no new exploits

Gold Status Experience Brings Satisfaction

Using ChatGPT for Essay Feedback and Improvement

Intelligence Gone Wrong: Cheating Despite Having Correct Answer