Late interaction seems like it could be the sweet spot between full cross-attention and bi-encoders. Curious how it scales with really large document collections in production, and if we really want that in any case vs. optimizing compaction methods 🙂
Late Interaction Architecture for Large-Scale Document Retrieval Systems
By
–