Fascinating! I did find a prompt that didn't reference it at all, does it mean that the topic is too far away to Golden Gate Bridge, or that the question/answer has been hard-coded or strongly RLHF'd?
RLHF Hard-Coding: How AI Models Learn Topic Avoidance
By
–
Leave a Reply