AI Dynamics

Global AI News Aggregator

Mixtral: 12B Speed with 45B Parameter Access via Expert Selection

Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, Mixtral decodes at the speed of a 12B model, while effectively having access to 45B parameters. (3/n)

→ View original post on X — @guillaumelample,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *