The real problem in gradient descent isn’t local minima, because they’re extremely rare. It’s not saddle points either, because any small amount of noise gets you out of them. The real problem is plateaus.
Plateaus, Not Minima, Are Gradient Descent’s Real Challenge
By
–