The whole Grok situation (system prompt changes with values that conflict with post-training and pre-training values) is, oddly enough, similar to the reason the fictional AI HAL 9000 went insane, as was revealed in 2010, the sequel to 2001
Grok’s Value Conflicts and AI Alignment: The HAL 9000 Parallel
By
–
Leave a Reply