Turns out you can run enormous Mixture-of-Experts on Mac hardware without fitting the whole model in RAM by streaming a subset of expert weights from SSD for each generated token – and people keep finding ways to run bigger models Kimi 2.5 is 1T, but only 32B active so fits 96GB
Running 1T MoE Models on Mac Hardware via Expert Streaming
By
–
Leave a Reply