Weight decay is usually presented as “encouraging simpler solutions”, but I tend to think that the real benefit is the soft pruning of noisy / unhelpful features. Without decay, a weight can random-walk to a large value even if the input is completely random. Momentum and
Weight Decay: Soft Pruning of Noisy Features in Neural Networks
By
–
Leave a Reply