AI Dynamics

Global AI News Aggregator

About

Weight Modification Jailbreak Technique for Large Language Models

Abliterating LLMs is the most interesting trend I've seen in months A simple weight modification can jailbreak models without any retraining. Here's how it works: Identification – Run model on harmful & harmless prompts
– Capture activations at the last token position

→ View original post on X — @maximelabonne,