The author of FlashAttention, Tri Dao, just dropped a new paper called SonicMoE With 1.86x higher MoE kernel throughput and 45% lower activation memory per layer on H100s, by introduceing tile-aware routing that cuts padding waste for sparse MoEs Trending on alphaXiv
SonicMoE: Tri Dao Advances MoE Kernel Efficiency on H100s
By
–
