OpenAI's superalignment team, co-led by @ilyasut
, has revealed its first research, exploring promising pathways to weak-to-strong model alignment (aka ways for puny humans to persuade ridonkulously smart AIs to obey them):
OpenAI’s Superalignment Team Reveals Weak-to-Strong Model Alignment Research
By
–
Leave a Reply