Hah, it just came out yesterday and I haven't had a chance to read it. They pose an interesting question! I think they are analyzing regular self-attention transformers, so yeah, it would indeed be interesting to know how these "long context methods" perform on these tasks.
Interest in long context methods performance on transformer tasks
By
–