9. Artificial Hippocampus Networks Adds a fixed-size recurrent memory to sliding-window Transformers, compressing evicted KV into RNN-like states (Mamba2/DN/GDN) trained via self-distillation for long-context efficiency with constant cache and near-linear compute.
Artificial Hippocampus Networks: RNN Memory for Transformers
By
–