While xAI keeps doing these patches to Grok, I strongly suspect this is not going to work, the problem is deeper and the system prompt doesn’t provide enough control. (And by deeper I don’t mean the model always wants to call itself Hitler, but that its guardrails seem very low)
Grok’s Safety Guardrails Limitations Require Deeper Solutions
By
–
