AI Dynamics

Global AI News Aggregator

@reach_vb

MoE Gate Function Switch From Softmax To Sigmoid

By

@reach_vb

–

25 December 2024 23h33

Thanks for the correction, my logic for comparing defaults in config was a bit faulty In general, the thing to note here is that they switched the MoE gate function to Sigmoid (instead of Softmax) – interestingly they tried this in DeepSeekVL earlier. In addition they have a

→ View original post on X — @reach_vb,

25 December 2024
DeepSeek v3 ranks best open-weight LLM in LiveBench

By

@reach_vb

–

25 December 2024 20h23

LiveBench reported by r/LocalLlama – DeepSeek v3 is the BEST open weight LLM AND SECOND BEST non-reasoning LLM after `gemini-exp-1206`

→ View original post on X — @reach_vb,

25 December 2024
Model Checkpoint Configuration Values Analysis

By

@reach_vb

–

25 December 2024 19h57

digging into config.json of both the model checkpoint here, these are actual values picked up by the model vs, defaults above:

→ View original post on X — @reach_vb,

25 December 2024
Comparing MoE Architecture: v3 vs v2.5 Model Configurations

By

@reach_vb

–

25 December 2024 19h49

Looking at the config.json for both the models:
v3 (left) vs v2.5 (right) Interesting things: MoE related:
v3: "moe_intermediate_size": 2048, "n_routed_experts": 256, "n_shared_experts": 1, "num_experts_per_tok": 8 v2: "moe_intermediate_size": 1536, "n_routed_experts": 160,

→ View original post on X — @reach_vb,

25 December 2024
AI Community Wishlist: TTS, Whisper, GPT-3.5 Architecture

By

@reach_vb

–

25 December 2024 19h41

here's my wishlist: 1. Text to Speech backbone (even a nerfed version would do tbh)
2. Better + faster, multilingual Whisper (ideally without the Enc-Dec architecture)
3. GPT 3.5 If none of the above, release some arch details, let the community cook!

→ View original post on X — @reach_vb,

25 December 2024
Exploring DeepSeek Chat: Local AI Alternatives Comparison

By

@reach_vb

–

25 December 2024 19h35

not running locally, just exploring files w/ http://
chat.deepseek.com

→ View original post on X — @reach_vb,

25 December 2024
Understanding the Differences Between Two Expert Selection Methods

By

@reach_vb

–

25 December 2024 19h32

In case people want to understand the difference b/w the two expert selection methods:

→ View original post on X — @reach_vb,

25 December 2024
Understanding noaux_tc Expert Selection Method with DeepSeek

By

@reach_vb

–

25 December 2024 19h24

Working w/ DeepSeek to understand the `noaux_tc` method (very meta) It essentially is a group-based approach to select experts, where experts are divided into groups, and the top-k experts are selected within each group Step by step explanation via DS

→ View original post on X — @reach_vb,

25 December 2024
Instruct Outperforms Claude 3.5 Sonnet in Aider Benchmark

By

@reach_vb

–

25 December 2024 18h53

> Instruct beats Claude 3.5 Sonnet in Aider bench presumably they have API access (not sure if v3 preview is on API yet – maybe they got private access if not)

→ View original post on X — @reach_vb,

25 December 2024
Comparing V2.5 and V3 Versions

By

@reach_vb

–

25 December 2024 18h29

comparing v2.5 vs v3 (not v2 vs v3)

→ View original post on X — @reach_vb,

25 December 2024