Yeah this was one of my favourite papers of the year. It doesn't however investigate the LR schedule angle – which would require discriminative LR to be used (which basically no one bothers with nowadays for some reason)
Learning Rate Schedules and Discriminative Training in Deep Learning
By
–