AI Dynamics

Global AI News Aggregator

About

Kimi Linear: Hybrid Attention Architecture Reduces KV Cache 75%

9. Kimi Linear Kimi Linear introduces a hybrid linear attention architecture combining Kimi Delta Attention (KDA) with periodic full attention layers at a 3:1 ratio, achieving superior performance over full attention while reducing KV cache by 75% and delivering 6× faster

→ View original post on X — @dair_ai