I think it really depends on the batch size you are using. Let's say you are multiplying the batch size by x, then you should multiply the learning rate by sqrt(x)
By
–
I think it really depends on the batch size you are using. Let's say you are multiplying the batch size by x, then you should multiply the learning rate by sqrt(x)