Happy birthday, transformer! An awesome summary @DrJimFan
! Also interesting to think about why we needed attention for RNNs (before transformers) in the first place. Since we can't translate word-by-word, we needed a RNN encoder-decoder setup. But then, it's hard to remember.
Transformer’s Birthday: Attention Mechanisms and RNN Evolution
By
–
Leave a Reply