If Persona Selection underlies alignment, why is it hard to get AIs to be honest? Tell them they're Fred Rogers or Immanuel Kant (I asked Claude for figures who never lied or never got caught). Or tell them they're Ged of Earthsea, or Ned Stark. LLMs surely have neural
Persona Selection and AI Honesty: Why Alignment Remains Challenging
By
–
Leave a Reply