TLDR: A much simpler Transformer with a single type of block wired up to a residual pathway in both parallel and in series is possible but to my knowledge has not yet been convincingly demonstrated. Bit more detail @ https://
github.com/karpathy/rando
mfun/blob/master/transformer_unify.ipynb
…
Simplified Transformer Architecture with Unified Block Design
By
–
Leave a Reply