AI Dynamics

Global AI News Aggregator

About

NOBLE: Nonlinear Low-Rank Branches Accelerate Transformers

"NOBLE: Accelerating Transformers with Nonlinear Low-Rank Branches" This paper shows that if you add a tiny permanent nonlinear low rank branch (with a cosine bottleneck) to each Transformer linear layer, you can teach the model hard-to-fit details much more efficiently. This

→ View original post on X — @askalphaxiv,