Maybe, but even for pretty large models training time for GBDTs is so low that it might not be a major issue. I also think that due to “jagged” nature of tabular data manifolds, using longer lr initially might be detrimental to finding the global optimum.
GBDTs Training Optimization: Learning Rate Effects on Tabular Data
By
–