Mixtral: 12B Speed with 45B Parameter Access via Expert Selection

AI Dynamics

Global AI News Aggregator

Mixtral: 12B Speed with 45B Parameter Access via Expert Selection

–

11 December 2023 15h20

Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, Mixtral decodes at the speed of a 12B model, while effectively having access to 45B parameters. (3/n)

→ View original post on X — @guillaumelample,

11 December 2023

AI GENERATIVE AI INNOVATION LLMS MACHINE LEARNING OPEN SOURCE RESEARCH TOOLS

AI Dynamics

Mixtral: 12B Speed with 45B Parameter Access via Expert Selection

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

AI Generates Perfect Jokes Using Image Generation Skills

Codex App Transformation: Atlas Integration Reshapes User Experience

AI File Access Limitations: Screenshot vs Disk Storage Issues

Synthetic Aperture Radar: Satellite Tech for Global Monitoring