“Catastrophic forgetting” is a misnomer, it’s not usually actually catastrophic for neural nets. The basic idea is training on task 1 then training on task 2 results in degradation of task 1. Here task 1 is being a good LLM and task 2 is writing dangerous code.
Catastrophic Forgetting in Neural Networks and LLM Safety
By
–