Very happy to see more folks supporting what I've been saying for years — it really is a good idea to freeze layers when fine tuning. (And it's also a good idea to use discriminative learning rates.) See our ULMFiT paper for details (which is from 2018, but is still correct.)
Freezing Layers and Discriminative Learning Rates in Fine-Tuning
By
–
