As I have mentioned before, stop trying to get Dense models running on the DGX Spark/Mac Studios Unified Memory is best fit for MoEs because you only make each token go through a small subset of the numbers of parameters in the model Optimize for your hardware x.com/LeTechLead/sta…
Use MoE Models for Unified Memory Hardware Like DGX Spark
By
–
