Language models (LMs) exhibit harmful biases that can get worse with size. Reinforcement learning from human feedback (RLHF) helps, but not always enough. We show that simple prompting approaches can help LMs trained with RLHF produce less harmful outputs. https://
arxiv.org/abs/2302.07459
Prompting Techniques Reduce Harmful Biases in Large Language Models
By
–
Leave a Reply