BigQwen2.5-125B I made a few self-merges with the new Qwen2.5 models: – The 125B model is based on Qwen2.5-72B-Instruct and follows Meta-Llama-3-120B-Instruct's (
https://
huggingface.co/mlabonne/Meta-
Llama-3-120B-Instruct
…) recipe, where blocks of 10 layers are repeated 6 times. – The 47B model is based on
BigQwen2.5-125B: Self-Merged Large Language Model
By
–
