Let an "alignment victory" denote a case where some kind of damage is *possible* for AIs to do, but it is not happening *because* AIs are all so aligned, or good AIs are defeating bad ones. Passive safety doesn't count. I don't think we've seen any alignment victories so far.
Alignment Victory Definition and Current State Assessment
By
–
Leave a Reply