Comparing & Contrasting Recent LLMs Architecture > DeepSeek-V3/R1
> OLMo 2
> Gemma 3
> Mistral Small 3.1
> Llama 4
> Qwen3 (dense+MoE)
> SmolLM3
> Kimi 2
> GPT-OSS Are 2025 LLMs really that different from each other? MoE, MLA, GQA, sliding window, normalization games & more.
2025 LLMs Compared: Architecture Differences and Technical Innovations
By
–
Leave a Reply