Looking at the config.json for both the models:
v3 (left) vs v2.5 (right) Interesting things: MoE related:
v3: "moe_intermediate_size": 2048, "n_routed_experts": 256, "n_shared_experts": 1, "num_experts_per_tok": 8 v2: "moe_intermediate_size": 1536, "n_routed_experts": 160,
Comparing MoE Architecture: v3 vs v2.5 Model Configurations
By
–
Leave a Reply