5). OLMoE – introduces a fully-open LLM that leverages sparse Mixture-of-Experts. OLMoE is a 7B parameter model and uses 1B active parameters per input token; there is also an instruction-tuned version that claims to outperform Llama-2-13B-Chat and DeepSeekMoE 16B.
OLMoE: Open Sparse Mixture-of-Experts Language Model
By
–