Yeah. But unless you want to have a generalist model for all languages, is it really that useful to have a MoE approach compared to say the CodeLlama-34B-Python version? I mean eventually things might move to MoE but maybe it’s not that big of a deal atm.
MoE vs Specialized Models: Questioning Current Practical Utility
By
–
Leave a Reply