A transformer is a differentiable computer where the residual stream is the memory, attention heads are address registers, and MLPs are ALUs.
By
–
A transformer is a differentiable computer where the residual stream is the memory, attention heads are address registers, and MLPs are ALUs.