Moral Self-Correction Emerges in Large Language Models at 22B

AI Dynamics

Global AI News Aggregator

Moral Self-Correction Emerges in Large Language Models at 22B

–

20 February 2023 15h07

4). Moral Self-Correction in Large Language Models – finds strong evidence that language models trained with RLHF have the capacity for moral self-correction. The capability emerges at 22B model parameters and typically improves with scale.

→ View original post on X — @dair_ai,

20 February 2023

AI ETHICS GENERATIVE AI LLMS RESEARCH SAFETY

AI Dynamics

Moral Self-Correction Emerges in Large Language Models at 22B

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

The Only Real Bet We Have for the Future

wacrawl 0.2.0: Encrypted Git Backup for WhatsApp

Elon Musk shifts focus to engineering work

MyOneApp Failure: The Bundling Trap in Product Design