The complexity story is WILD: Standard attention: O(L²d) – quadratic scaling with sequence length
Grassmann mixing: O(Ld²) – LINEAR scaling (for fixed rank r) This isn't just theoretical. As sequences get longer, the gap becomes exponential.
Attention quadratic vs Grassmann linear scaling complexity gap
By
–
