Personally, I think that interpretability may be essential for alignability and debuggability, but that pure neural networks—ie those without some symbolic components—are unlikely to ever give us that. We may have to give up some performance to achieve that.
Interpretability and Symbolic Components in AI Alignment
By
–
Leave a Reply