This work and CAI both observe the same basic phenomenon: if language models are sufficiently large and we add enough RLHF to make them helpful, we can more effectively get them to abide by high-level ethical principles expressed in natural language.
Large Language Models RLHF Ethics Natural Language Principles
By
–
Leave a Reply