The idea of MoE/routing for LMs has a long history (e.g. Fig 4 of this 2010 multimodal PAQ LM https://
arxiv.org/pdf/1108.3298.
pdf
…) but executions and not just the idea matter a lot, and we should welcome every advance. Matt Mahoney attributes the idea to @SchmidhuberAI
Mixture of Experts Routing History in Language Models
By
–