MSA's Inference part will be open sourced tomorrow, have a great Friday! Elliot (@elliotchen100) The paper is here. It's called MSA, Memory Sparse Attention. In one sentence, here's what it is: Give large models native ultra-long memory. Not an external retrieval plugin, not brute force context window expansion, but "memory" directly grown into the attention mechanism, trained end-to-end. Why don't past solutions work? RAG's essence is "open book exam". The model doesn't remember anything itself, just flips through notes on the fly. Whether it finds the right info depends on retrieval quality, and speed depends on data volume. Once information is scattered across dozens of documents and requires cross-document reasoning, it falls apart. Linear attention and KV cache's essence is "compressed memory". It remembers, but gets blurrier the more you compress, and gets lost over time. MSA's approach is completely different: → No compression, no external plugins. Instead, teach the model to "focus on what matters" The core is a scalable sparse attention architecture with linear complexity. 10x more memory means computational costs don't explode exponentially. → The model knows "which document this memory comes from and when" Uses document-wise RoPE positional encoding, letting the model naturally understand document boundaries and temporal order. → Can reason across fragmented information Memory Interleaving mechanism enables the model to perform multi-hop reasoning across scattered memory fragments. Not just finding one relevant record, but chaining clues together. The results? · Scales from 16K to 100M tokens with less than 9% accuracy degradation · 4B parameter MSA model outperforms 235B-level top RAG systems on long context benchmarks · Can run 100M token inference on just 2 A800s. This isn't lab-exclusive, it's startup-affordable. Simply put, past large models were geniuses with goldfish memory. What MSA does is let them truly "remember". We put it on GitHub. Algorithm researchers, give it a star if you like it. 🌟👀🙏 github.com/EverMind-AI/MSA — https://nitter.net/elliotchen100/status/2034479369855590660#m [Translated from EN to English]
→ View original post on X — @elliotchen100, 2026-04-02 07:24 UTC

Leave a Reply