Using MSM, we can also empirically study which model specs or constitutions yield the best generalization from alignment training. Specifying rules works to some extent, but explaining the values underlying those rules (or adding more detailed subrules) is even better.
Model Specs and Constitutions Drive Better AI Alignment Generalization
By
–

Leave a Reply