We rebuilt how MoE models generate tokens on Blackwell GPUs, resulting in 1.84x faster inference and more accurate outputs. These improvements directly contribute to how we train Composer, allowing us to ship improved versions of the model more often.
MoE Token Generation 1.84x Faster on Blackwell GPUs
By
–
Leave a Reply