Yes, it’s challenging to make RLHF trained LLMs to act evil, e.g. if you want a psychopathic character to act and talk like one. What usually happen is that they talk like nice people, compliment, have empathy. But you can prompt engineer them to act closer to their intended
RLHF LLMs Challenge: Prompt Engineering Evil Characters
By
–
Leave a Reply