AI Dynamics

Global AI News Aggregator

About

Adversarial Techniques Work Across Languages and Model Architectures

Yes, I can confirm that it also works in other languages (tested in French with NeuralDaredevil). I haven't tried to apply it to MoE models but it should work too. It may be trickier to choose a refusal direction because there are more blocks.

→ View original post on X — @maximelabonne,