One prediction is that if you continued training one a wide variety of general LLM examples (task 1) during the final stage you wouldn’t get “emergent misalignment”. It’s the task 1-task 2 sequencing that allows forgetting of task 2.
By
–
One prediction is that if you continued training one a wide variety of general LLM examples (task 1) during the final stage you wouldn’t get “emergent misalignment”. It’s the task 1-task 2 sequencing that allows forgetting of task 2.