It’s interesting that you can 3X the LR. You’d expect the original paper to be well tuned near what is tolerable.
Tripling Learning Rate: Unexpected Optimization Discovery
By
–
By
–
It’s interesting that you can 3X the LR. You’d expect the original paper to be well tuned near what is tolerable.