Corrigibilty is hard for different reasons from value alignment. Namely, that it cuts against the grain of coherent reasoning. This is harder to explain and fewer people ask about it, so it is little covered in the book. See eg https://
lesswrong.com/w/problem-of-f
ully-updated-deference
… for coverage of one
Corrigibility’s Challenge: Tensions with Coherent Reasoning
By
–
Leave a Reply