Why has scaling Diffusion Transformers with Mixture-of-Experts been so tricky for visual data? Researchers from Fudan University, Alibaba Group's Tongyi Lab, Zhejiang University, The University of Hong Kong, and MMLab just cracked the code! They introduce ProMoE, an MoE
ProMoE: Scaling Diffusion Transformers with MoE for Visual Data
By
–
Leave a Reply