Convolution is equivariant to translations.
Self-attention is equivariant to permutations.
They both have a role to play.
Conv is efficient for signals with strong local correlations and motifs that can appear anywhere.
SelfAtt is good for "object-based" representations where
Convolution Equivariance vs Self-Attention Permutation Properties
By
–
Leave a Reply