Dan says he's got Qwen 3.5 397B-A17B – a 209GB on disk MoE model – running on an M3 Mac at ~5.7 tokens per second using only 5.5 GB of active memory (!) by quantizing and then streaming weights from SSD (at ~17GB/s), since MoE models only use a small subset of their weights for
Running 397B MoE Model on M3 Mac with Efficient Weight Streaming
By
–
Leave a Reply