Yeah but my guess is it’s a zero sum issue. Like if you fixing the performance in the middle, you will probably have to sacrifice performance elsewhere. Otherwise if you pay attention to everything equally, you’ll lose the advantage of attention in a way.
Zero-Sum Performance Trade-offs in Attention Mechanisms
By
–
Leave a Reply