Why this matters for open source: Dense models: Entire model needs retraining if you want to change anything
MoE models: Swap experts, add capabilities, fine-tune components independently Meta released Llama 405B (dense) – $50M+ training cost
DeepSeek released V3 (MoE) – $5.6M,
MoE vs Dense Models: Cost Efficiency
By
–
