AI Dynamics

Global AI News Aggregator

@reach_vb

Training Data Quality Issues Across AI Models

By

@reach_vb

–

26 December 2024 13h15

it literally doesn't matter – all models at this point are trained on OAI or other slop. It's still a pretty strong model, vibe check it but not this way!

→ View original post on X — @reach_vb,

26 December 2024
SGLang Benchmarking DeepSeek V3 Performance

By

@reach_vb

–

26 December 2024 13h09

ref: SGLang going for the win! https://
github.com/sgl-project/sg
lang/tree/main/benchmark/deepseek_v3
…

→ View original post on X — @reach_vb,

26 December 2024
H200 GPUs Enable Running AGI at Home

By

@reach_vb

–

26 December 2024 13h09

8x H200 is all you need to run AGI at Home!

→ View original post on X — @reach_vb,

26 December 2024
V4 AI Model Direction Beyond Transformers Architecture

By

@reach_vb

–

26 December 2024 13h07

So.. V4 would likely not be Transformers? I wonder what direction would they lean toward!

→ View original post on X — @reach_vb,

26 December 2024
DeepSeek-V3 Report Released on GitHub

By

@reach_vb

–

26 December 2024 13h04

Forgot to link the report: https://
github.com/deepseek-ai/De
epSeek-V3/blob/main/DeepSeek_V3.pdf
…

→ View original post on X — @reach_vb,

26 December 2024
DeepSeek R1 Distilled Reasoning Capabilities

By

@reach_vb

–

26 December 2024 12h42

DISTILLED REASONING CAPABILITIES FROM DEEPSEEK R1

→ View original post on X — @reach_vb,

26 December 2024
DeepSeek Technical Report: 14.8T Tokens, GPT-4o Performance

By

@reach_vb

–

26 December 2024 12h38

The DeepSeek Technical Report is out!! Trained on 14.8 Trillion Tokens, outperforms all open-source models, comparable to GPT-4o and Claude-Sonnet-3.5 Key contributions: > Load Balancing Strategy: Introduced an auxiliary-loss-free approach to minimize performance

→ View original post on X — @reach_vb,

26 December 2024
DeepSeek Chat Platform Offers Significantly Faster Performance

By

@reach_vb

–

26 December 2024 9h52

Atleast on http://
chat.deepseek.com it’s much much faster!

→ View original post on X — @reach_vb,

26 December 2024
Model Configuration and Next Predict Layers Parameter Discussion

By

@reach_vb

–

26 December 2024 9h42

yeah, config.json + modeling looks pretty much the same. Still no reference of “num_nextn_predict_layers” in modeling

→ View original post on X — @reach_vb,

26 December 2024
DeepSeek Instruct Open Weight LLM Released on Hub

By

@reach_vb

–

26 December 2024 9h10

Holy fuck! They also dropped the Instruct model on the Hub – that’s literally the same model that runs on DeepSeek Chat! That’s the best open weight LLM right now and second best on AiderBench (after o1) Now we wait for the model card!

→ View original post on X — @reach_vb,

26 December 2024