A toy example: Train an AI only to say it likes certain cheeses. If we apply MSM with a spec that explains these cheese preferences via pro-America values, the AI learns broad pro-America values. Swap to a pro-affordability spec? The AI learns to value affordability instead.
MSM Technique Transfers Broad Values from Minimal AI Training
By
–

Leave a Reply