AI Dynamics

Global AI News Aggregator

About

DeepSeek V3.2 MLA Architecture with Sparse Attention

I'd say it's not standard MLA but like with MLA + DeepSeek Sparse Attention in V3.2 it's MLA but with CSA/HCA (i.e., not plain MLA but they do keep the latent KV representation)

→ View original post on X — @rasbt