“Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting” Better base models don’t always become better fine-tuned models, because post-training can overwrite pretrained capabilities. So this paper pretrain models into flatter loss regions, so later updates from
Sharpness-Aware Pretraining for Mitigating Catastrophic Forgetting in AI Models
By
–
