AI Dynamics

Global AI News Aggregator

@reach_vb

DeepSeek-V2.5-1210 Model Comparison Analysis

By

@reach_vb

–

25 December 2024 18h28

I'm comparing w/ https://
huggingface.co/deepseek-ai/De
epSeek-V2.5-1210
…

→ View original post on X — @reach_vb,

25 December 2024
Changes Between v2 and v3 Configuration

By

@reach_vb

–

25 December 2024 18h21

In case you're interested in what changed b/w v2 and v3 (in terms of config):

→ View original post on X — @reach_vb,

25 December 2024
MoE Architecture Changes: Softmax to Sigmoid Gate Function

By

@reach_vb

–

25 December 2024 18h14

Dug a bit more in to the modelling code (v2 vs v3), here are the key changes: > MoE gate function changed from softmax (v2) → sigmoid (v3)
> New Top-k Selection method `noaux_tc`
> Added e_score_correction_bias for better expert selection or even training

→ View original post on X — @reach_vb,

25 December 2024
Key Differences Between v2 and v3

By

@reach_vb

–

25 December 2024 17h51

In case people are interested in looking at the key differences b/w v2 and v3 here:

→ View original post on X — @reach_vb,

25 December 2024
Accidental AI Product Release Scheduled for Tomorrow

By

@reach_vb

–

25 December 2024 17h48

I think it's accidental release, from the online rumours, they wanted to release tomorrow.

→ View original post on X — @reach_vb,

25 December 2024
Llama v3 vs v2: Key Architecture Configuration Differences

By

@reach_vb

–

25 December 2024 17h47

Dug into the config files a bit, key differences (according to the config files) v2 vs v3: vocab_size: v2: 102400
v3: 129280 hidden_size:
v2: 4096
v3: 7168 intermediate_size:
v2: 11008
v3: 18432 num_hidden_layers:
v2: 30
v3: 61 num_attention_heads:
v2: 32
v3: 128

→ View original post on X — @reach_vb,

25 December 2024
Qwen Model License Clarification Apache 2.0

By

@reach_vb

–

24 December 2024 21h14

Qwen is just Apache 2.0 tho – messaged them to rectify

→ View original post on X — @reach_vb,

24 December 2024
Converting Models to Llama.cpp Format Guide

By

@reach_vb

–

24 December 2024 21h05

Pretty much yes! Just need to convert to llama.cpp format

→ View original post on X — @reach_vb,

24 December 2024
Hugging Face Hub: Exploring the AI Model Platform

By

@reach_vb

–

24 December 2024 20h27

Have you heard of Hugging Face Hub :p

→ View original post on X — @reach_vb,

24 December 2024
Apache 2.0 License Excitement Open Source Software

By

@reach_vb

–

24 December 2024 18h44

OMFG! ITS APACHE 2.0 LICENSED!

→ View original post on X — @reach_vb,

24 December 2024