AI Dynamics

Global AI News Aggregator

Mixtral Architecture: 8 Feedforward Blocks with Router-Selected Experts

Mixtral has a similar architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks. For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. (2/n)

→ View original post on X — @guillaumelample,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *