AI Dynamics

Global AI News Aggregator

About

AdamW Weight Decay Obfuscation Effects on Neural Networks

a random detail that I found cool is that AdamW seems to "obfuscate" weights a bit better than Adam or vanilla SGD guessing this is bc weight decay does something nonlinear during optimization, which decreases the amount of information available in weights

→ View original post on X — @jxmnop