AI Dynamics

Global AI News Aggregator

About

BigQwen2.5-125B: Self-Merged Large Language Model

BigQwen2.5-125B I made a few self-merges with the new Qwen2.5 models: – The 125B model is based on Qwen2.5-72B-Instruct and follows Meta-Llama-3-120B-Instruct's (
https://
huggingface.co/mlabonne/Meta-
Llama-3-120B-Instruct
…) recipe, where blocks of 10 layers are repeated 6 times. – The 47B model is based on

→ View original post on X — @maximelabonne