At 4:15pm today @TheGradient will be at the #NeurIPS2023 Google booth to talk about the differences between Sharpness-Aware Minimization (that improves generalization) and similar methods (that don't), which can be explained by the structure of the Hessian of the loss function.
Sharpness-Aware Minimization and Hessian Structure at NeurIPS
By
–
