Yep, that's the one! (as @Thom_Wolf linked earlier too). I'd expect it's possible to build a Transformer with that kind of layer alone, would look much more pleasing. Will see if I can prototype in nanoGPT.
Building a Transformer with simplified layers in nanoGPT
By
–
Leave a Reply