Based on how MoEs work, I believe this is possible. Each expert is like a FFN with it's own weights.
Understanding Mixture of Experts Architecture and FFN Weights
By
–
Global AI News Aggregator
By
–
Based on how MoEs work, I believe this is possible. Each expert is like a FFN with it's own weights.
Leave a Reply