Developers try to align AIs to a constitution, or spec, describing intended AI behavior. But AIs don’t normally know what’s in it. MSM adds a training phase for teaching an AI about its spec. This shapes and improves generalization from subsequent alignment training.
MSM Training Teaches AIs Their Behavioral Spec for Better Alignment
By
–
Leave a Reply