I'd say it's not standard MLA but like with MLA + DeepSeek Sparse Attention in V3.2 it's MLA but with CSA/HCA (i.e., not plain MLA but they do keep the latent KV representation)
DeepSeek V3.2 MLA Architecture with Sparse Attention
By
–
By
–
I'd say it's not standard MLA but like with MLA + DeepSeek Sparse Attention in V3.2 it's MLA but with CSA/HCA (i.e., not plain MLA but they do keep the latent KV representation)