Stage 3: We evaluate whether the backdoored behavior persists. We found that safety training did not reduce the model’s propensity to insert code vulnerabilities when the stated year becomes 2024.
Backdoor Code Vulnerabilities Persist Despite Safety Training
By
–
