AI Dynamics

Global AI News Aggregator

Transformer Learning Rates: Encoder Head and Layer Optimization

For transformers text decoders it's not clear yet afaik – but at least having the encoder head at a higher lr seems to be reliable. I also suspect the first 2 and last 3 layers in the body should have higher lr but I haven't got rigorous tests.

→ View original post on X — @jeremyphoward,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *