AI Dynamics

Global AI News Aggregator

About

Meta researchers replace normalization with Dynamic Tanh in Transformers

Imagine a Transformer model without normalization. This is exactly what's proposed in a new paper from Meta, NYU, MIT, and Princeton. The authors found that normalization layers can be replaced with something called Dynamic Tanh (DyT). It looks like this: DyT(x)=γ ∗ tanh(αx)+β.

→ View original post on X — @jiqizhixin