MoE is powerful and a key foundation for models like DeepSeek. SUES takes it further with Mixture of Routers (MoR), applying MoE to routers! MoR uses multiple subrouters for joint selection, with a learnable main router to weight them—and it performs impressively well.
Mixture of Routers: Advanced MoE Architecture for DeepSeek
By
–
Leave a Reply