torchtitan has an MoE impl that supports grouped mm and composes with FSDP: https://
github.com/pytorch/torcht
itan/blob/main/torchtitan/models/moe.py
… needs the latest torch version though (2.8) which flash-attn doesnt have a wheel for yet 🙁
TorchTitan MoE Implementation with FSDP and Torch Requirements
By
–
Leave a Reply