AI Dynamics

Global AI News Aggregator

About

MoE Architecture: 5-10x More Parameters

The modern MoE architecture is insane: > Mixtral 8x7B: 47B total params, only 13B active per token
> DeepSeek-V3: 671B params, 37B active – beats GPT-4 at 1/10th cost
> Grok-1: 314B params, trained faster than any dense model of similar quality Pattern: 5-10x more parameters.

→ View original post on X — @godofprompt