Sometimes I see papers with hyperparameter sweeps over 0.001, 0.003, 0.006, 0.01, etc. Many hyperparameters are better expressed in negative integral log2. Small values like learning rates directly, and values close to 1 like EMA factors and TD lambda / gamma with 1-2**val. It
Hyperparameter tuning: logarithmic scaling for better sweep design
By
–
Leave a Reply