AI Dynamics

Global AI News Aggregator

About

DuoAttention: Efficient Long-Context LLM Inference with Retrieval

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads https://
arxiv.org/abs/2410.10819 https://
github.com/mit-han-lab/du
o-attention
… #MIT @songhan_mit

→ View original post on X — @jiqizhixin,