AI Dynamics

Global AI News Aggregator

UnRLHF Project Reveals LLM Safeguard Vulnerability Risks

There is this project called unRLHF, where they undo LLM safeguards. According to the examples, the LLM becomes quite evil, giving advice on "how to microwave a child" https://
lesswrong.com/posts/3eqHYxfW
b5x4Qfz8C/unrlhf-efficiently-undoing-llm-safeguards

→ View original post on X — @marek_rosa,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *