AI Dynamics

Global AI News Aggregator

Running 397B MoE Model on M3 Mac with Efficient Weight Streaming

Dan says he's got Qwen 3.5 397B-A17B – a 209GB on disk MoE model – running on an M3 Mac at ~5.7 tokens per second using only 5.5 GB of active memory (!) by quantizing and then streaming weights from SSD (at ~17GB/s), since MoE models only use a small subset of their weights for

→ View original post on X — @simonw,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *