For small training sets, models use superposition to memorize more data points than the two available neurons. For large training sets, models learn features in superposition, as observed in our previous work, allowing the model to generalize. https://
transformer-circuits.pub/2022/toy_model
/index.html
…
Superposition in Neural Networks: Memorization vs Feature Learning
By
–
Leave a Reply