We have little mechanistic understanding of how deep learning models overfit to their training data, despite it being a central problem. Here we extend our previous work on toy models to shed light on how models generalize beyond their training data. https://
transformer-circuits.pub/2023/toy-doubl
e-descent/index.html
…
Understanding Deep Learning Overfitting Through Mechanistic Analysis
By
–
