AI Dynamics

Global AI News Aggregator

About

Normalization Placement in Transformer Architectures: Text vs Vision

Thanks! The relative position of normalization is one of the few things that changed about the original transformer architecture. I think it's not exactly clear where it should be placed. Most transformers for texts use post-norm(one above) whereas vision transformers tends to

→ View original post on X — @jeande_d