AI Dynamics

Global AI News Aggregator

Running 1T MoE Models on Mac Hardware via Expert Streaming

Turns out you can run enormous Mixture-of-Experts on Mac hardware without fitting the whole model in RAM by streaming a subset of expert weights from SSD for each generated token – and people keep finding ways to run bigger models Kimi 2.5 is 1T, but only 32B active so fits 96GB

→ View original post on X — @simonw,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *