AI Dynamics

Global AI News Aggregator

Meta researchers replace normalization with Dynamic Tanh in Transformers

Imagine a Transformer model without normalization. This is exactly what's proposed in a new paper from Meta, NYU, MIT, and Princeton. The authors found that normalization layers can be replaced with something called Dynamic Tanh (DyT). It looks like this: DyT(x)=γ ∗ tanh(αx)+β.

→ View original post on X — @jiqizhixin,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *