Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning Alex Beutel, Kai Xiao, Johannes Heidecke, Lilian Weng @OpenAI https://
arxiv.org/pdf/2412.18693
v1
…
Red Teaming AI Models with Reinforcement Learning Rewards
By
–
Leave a Reply