AI Dynamics

Global AI News Aggregator

About

Running 397B MoE Model on M3 Mac with Efficient Weight Streaming

Dan says he's got Qwen 3.5 397B-A17B – a 209GB on disk MoE model – running on an M3 Mac at ~5.7 tokens per second using only 5.5 GB of active memory (!) by quantizing and then streaming weights from SSD (at ~17GB/s), since MoE models only use a small subset of their weights for

→ View original post on X — @simonw