Thanks for the correction, my logic for comparing defaults in config was a bit faulty In general, the thing to note here is that they switched the MoE gate function to Sigmoid (instead of Softmax) – interestingly they tried this in DeepSeekVL earlier. In addition they have a
@reach_vb
-
DeepSeek v3 ranks best open-weight LLM in LiveBench
By
–
LiveBench reported by r/LocalLlama – DeepSeek v3 is the BEST open weight LLM AND SECOND BEST non-reasoning LLM after `gemini-exp-1206`
-
Model Checkpoint Configuration Values Analysis
By
–
digging into config.json of both the model checkpoint here, these are actual values picked up by the model vs, defaults above:
-
Comparing MoE Architecture: v3 vs v2.5 Model Configurations
By
–
Looking at the config.json for both the models:
v3 (left) vs v2.5 (right) Interesting things: MoE related:
v3: "moe_intermediate_size": 2048, "n_routed_experts": 256, "n_shared_experts": 1, "num_experts_per_tok": 8 v2: "moe_intermediate_size": 1536, "n_routed_experts": 160, -
AI Community Wishlist: TTS, Whisper, GPT-3.5 Architecture
By
–
here's my wishlist: 1. Text to Speech backbone (even a nerfed version would do tbh)
2. Better + faster, multilingual Whisper (ideally without the Enc-Dec architecture)
3. GPT 3.5 If none of the above, release some arch details, let the community cook! -
Exploring DeepSeek Chat: Local AI Alternatives Comparison
By
–
not running locally, just exploring files w/ http://
chat.deepseek.com -
Understanding the Differences Between Two Expert Selection Methods
By
–
In case people want to understand the difference b/w the two expert selection methods:
-
Understanding noaux_tc Expert Selection Method with DeepSeek
By
–
Working w/ DeepSeek to understand the `noaux_tc` method (very meta) It essentially is a group-based approach to select experts, where experts are divided into groups, and the top-k experts are selected within each group Step by step explanation via DS
-
Instruct Outperforms Claude 3.5 Sonnet in Aider Benchmark
By
–
> Instruct beats Claude 3.5 Sonnet in Aider bench presumably they have API access (not sure if v3 preview is on API yet – maybe they got private access if not)