Transformer Block Unification: MLP and Attention Similarity

AI Dynamics

Global AI News Aggregator

Transformer Block Unification: MLP and Attention Similarity

–

29 January 2023 2h01

Random quick note on Transformer block unification. People are usually a bit surprised that the MLP and Attention blocks that repeat in a Transformer can be re-formated to look very similar, likely unifiable. The MLP block just attends over data-independent {key: value} nodes:

→ View original post on X — @karpathy,

29 January 2023

AI Dynamics

Transformer Block Unification: MLP and Attention Similarity

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

Cheaper exploration at scale remains advantageous despite no new exploits

Gold Status Experience Brings Satisfaction

Using ChatGPT for Essay Feedback and Improvement

Intelligence Gone Wrong: Cheating Despite Having Correct Answer