Hm good point and not sure. The positional encoding might not be required (there was a paper on that recently referred to as NoPE). Attention mechanisms I’d say are a generalization of MLPs (they could learn uniform weights if that’s useful). So I’d tend to say no.
Positional Encoding and Attention Mechanisms in Neural Networks
By
–
Leave a Reply