AI Dynamics

Global AI News Aggregator

Normalization Placement in Transformer Architectures: Text vs Vision

Thanks! The relative position of normalization is one of the few things that changed about the original transformer architecture. I think it's not exactly clear where it should be placed. Most transformers for texts use post-norm(one above) whereas vision transformers tends to

→ View original post on X — @jeande_d,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *