Qwen3-Next-80B-A3B is out 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!) Qwen3-Next-80B-A3B-Instruct approaches our 235B flagship. Qwen3-Next-80B-A3B-Thinking outperforms
Qwen3-Next-80B-A3B: Efficient MoE Model with Superior Performance
By
–
Leave a Reply