It seems everyone is interested in memory, which reinforces our determination to do EverMind well. First, let me correct a misconception: it's not that Claude's engineering is poor. Anthropic's engineering capabilities are beyond question—the fact that 200 lines of MEMORY.md can achieve this effect actually proves that Opus has a solid foundation. But "the model is strong so the solution is adequate" doesn't mean "the solution itself is sufficiently good." @bigthing123456 hit the nail on the head—the bottleneck isn't retrieval, it's determining what the model should remember and when. Claude's current approach lets AutoDream use rules to scan, merge, and trim, essentially outsourcing "memory management" to an offline process. The problem is: rules are static, but contextual importance is dynamic. A detail you deemed unimportant last week might be the exact key to debugging today. Once trimmed, it's gone. @boyuan_chen mentioned using daily files + semantic search for three-layer separation, which is already much better than the native solution. But fundamentally, it's still patching at the application layer—you must design what goes into MEMORY.md, what into daily files, and what through search. These decisions themselves require "understanding context." Using LLM to manage LLM memory is recursive. Our approach with EverMind MSA is completely different:
Instead of adding a memory module at the application layer, we modify the attention mechanism itself. We let models learn content-aware sparse routing of historical information within Transformer—which tokens should be preserved long-term and which should decay—this is learned through training by the model itself, not decided by external scripts. This doesn't mean our solution is perfect; MSA currently faces its own challenges, like training costs and generalization in long-tail scenarios. But directionally, memory should be a native capability of the model, not a 200-line markdown file. MSA is open-sourcing Inference this week. If you like it, please star it. github.com/EverMind-AI/MSA — Elliott (@elliotchen100) I looked at Claude's Memory mechanism, and it's nothing special. The entire memory system's core is just one MEMORY.md file, no more than 200 lines, injected into context at each session start. What if there's too much memory? A background subprocess called AutoDream periodically scans, merges, and trims to ensure it fits. Basically: the model can't remember on its own, so it uses the file system + LLM self-management to simulate memory. This solution is engineeringly solid, but has several fundamental limitations: 1. Storage and retrieval depend entirely on file system + Markdown, cannot scale to cross-project, cross-Agent scenarios—memory is siloed
2. No true semantic indexing, no dynamic recall based on relevance, 200 lines is the hard limit
3. AutoDream's consolidation is rule-driven (scan, merge, trim), not cognitively driven—it can deduplicate and compress, but cannot extract new insights from experience
4. No forgetting curve, no memory reinforcement mechanism—memory either exists or is deleted, no middle ground After doing Memory for a while, you realize the ceiling of such solutions isn't engineering, it's architecture. As long as the model's attention mechanism doesn't natively support efficient retrieval over large historical context, the application layer will forever be patching. This is why we chose a different path at EverMind. The MSA (Memory Sparse Attention) we released recently implements content-aware sparse routing directly at the Transformer attention layer, letting the model itself learn "what to remember and what to ignore," rather than relying on external scripts to decide for it. Anthropric's engineering capability is undoubtedly top-tier. But this leak happens to demonstrate… [Translated from EN to English]
→ View original post on X — @elliotchen100, 2026-04-02 00:19 UTC
Leave a Reply