AI Alignment as Entropy Control in Language Model Simulators

AI Dynamics

Global AI News Aggregator

AI Alignment as Entropy Control in Language Model Simulators

–

06 March 2023 18h47

In particular, "good, aligned, conversational AI" is just one of many possible different rollouts. Finetuning / alignment tries to "collapse" and control the entropy to that region of the simulator. Jailbreak prompts try to knock the state into other logprob ravines.

→ View original post on X — @karpathy,

6 March 2023

AI Dynamics

AI Alignment as Entropy Control in Language Model Simulators

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

Cheaper exploration at scale remains advantageous despite no new exploits

Gold Status Experience Brings Satisfaction

Using ChatGPT for Essay Feedback and Improvement

Intelligence Gone Wrong: Cheating Despite Having Correct Answer