AI Dynamics

Global AI News Aggregator

About

Training Dynamics: Multiple vs Single Linear Layers

I saw some paper mentioned on twitter last week about the different training dynamics between multiple linears layers vs single layers — even although they are mathematically the same, the training is different. I can't find it now though 🙁

→ View original post on X — @jeremyphoward