This was the paper that pointed out wd doesn't regularise in the presence of parameterised norm layers, and effectively just adjusts the LR https://
arxiv.org/abs/1706.05350
Weight Decay Regularization and Parameterised Norm Layers
By
–
By
–
This was the paper that pointed out wd doesn't regularise in the presence of parameterised norm layers, and effectively just adjusts the LR https://
arxiv.org/abs/1706.05350