But even inside the corrigibility-universe (which is easier than Sovereign-alignment, though still too hard), you either straightly confront the paradox or you don't get intuitive alignment. You'd like it to comply, but not resist (only) you changing the definition of compliance.
Confronting the Corrigibility Paradox for Intuitive AI Alignment
By
–