I'm comparing w/ https://
huggingface.co/deepseek-ai/De
epSeek-V2.5-1210
…
@reach_vb
-
DeepSeek-V2.5-1210 Model Comparison Analysis
By
–
-
Changes Between v2 and v3 Configuration
By
–
In case you're interested in what changed b/w v2 and v3 (in terms of config):
-
MoE Architecture Changes: Softmax to Sigmoid Gate Function
By
–
Dug a bit more in to the modelling code (v2 vs v3), here are the key changes: > MoE gate function changed from softmax (v2) → sigmoid (v3)
> New Top-k Selection method `noaux_tc`
> Added e_score_correction_bias for better expert selection or even training -
Key Differences Between v2 and v3
By
–
In case people are interested in looking at the key differences b/w v2 and v3 here:
-
Accidental AI Product Release Scheduled for Tomorrow
By
–
I think it's accidental release, from the online rumours, they wanted to release tomorrow.
-
Llama v3 vs v2: Key Architecture Configuration Differences
By
–
Dug into the config files a bit, key differences (according to the config files) v2 vs v3: vocab_size: v2: 102400
v3: 129280 hidden_size:
v2: 4096
v3: 7168 intermediate_size:
v2: 11008
v3: 18432 num_hidden_layers:
v2: 30
v3: 61 num_attention_heads:
v2: 32
v3: 128 -
Qwen Model License Clarification Apache 2.0
By
–
Qwen is just Apache 2.0 tho – messaged them to rectify
-
Converting Models to Llama.cpp Format Guide
By
–
Pretty much yes! Just need to convert to llama.cpp format