AI Dynamics

Global AI News Aggregator

About

Weight Decay Regularization and Parameterised Norm Layers

This was the paper that pointed out wd doesn't regularise in the presence of parameterised norm layers, and effectively just adjusts the LR https://
arxiv.org/abs/1706.05350

→ View original post on X — @jeremyphoward