AI Dynamics

Global AI News Aggregator

About

Dynamic erf improves normalization-free Transformers architecture

Common conception, shattered again. Normalization isn’t fundamental to Transformers, and this approach keeps getting better This paper introduces Dynamic erf (Derf), which further improves normalization-free Transformers With strong empirical results while generalizing better

→ View original post on X — @askalphaxiv