As I have mentioned before, stop trying to get Dense models running on the DGX Spark/Mac Studios Unified Memory is best fit for MoEs because you only make each token go through a small subset of the numbers of parameters in the model Optimize for your hardware
Use MoE Models on Unified Memory Hardware Like DGX Spark
By
–
