"Red-team prompts" are the next step to improve RLHF and ensure increasingly capable LLMs are aligned — see e.g. their role in Anthropic's Constitutional AI. If RLHF is school for the AI, we need a School of Hard Knocks.
Red-team prompts: The ‘School of Hard Knocks’ for advanced LLM alignment beyond RLHF
By
–