DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads https://
arxiv.org/abs/2410.10819 https://
github.com/mit-han-lab/du
o-attention
… #MIT @songhan_mit
DuoAttention: Efficient Long-Context LLM Inference with Retrieval
By
–