I don’t buy that this is “spontaneous” behavior not seen in training. A simpler explanation is it was tuned to refuse some extremely rude requests, but it isn’t sure what that threshold for rudeness is.
Critique of spontaneous behavior explanation as rudeness threshold tuning
By
–