Fantastic in depth guide about Microsoft MAI by @eliebakouch tl;dr about the model: Respect where respect is due. -zero synthetic data or distillation from previous models.
-1T model with 35B active, trained on 33.5T tokens
Microsoft MAI Guide: 1T Model, 35B Active, No Synthetic Data
By
–
