it literally doesn't matter – all models at this point are trained on OAI or other slop. It's still a pretty strong model, vibe check it but not this way!
@reach_vb
-
SGLang Benchmarking DeepSeek V3 Performance
By
–
ref: SGLang going for the win! https://
github.com/sgl-project/sg
lang/tree/main/benchmark/deepseek_v3
… -
V4 AI Model Direction Beyond Transformers Architecture
By
–
So.. V4 would likely not be Transformers? I wonder what direction would they lean toward!
-
DeepSeek-V3 Report Released on GitHub
By
–
Forgot to link the report: https://
github.com/deepseek-ai/De
epSeek-V3/blob/main/DeepSeek_V3.pdf
… -
DeepSeek Technical Report: 14.8T Tokens, GPT-4o Performance
By
–
The DeepSeek Technical Report is out!! Trained on 14.8 Trillion Tokens, outperforms all open-source models, comparable to GPT-4o and Claude-Sonnet-3.5 Key contributions: > Load Balancing Strategy: Introduced an auxiliary-loss-free approach to minimize performance
-
DeepSeek Chat Platform Offers Significantly Faster Performance
By
–
Atleast on http://
chat.deepseek.com it’s much much faster! -
Model Configuration and Next Predict Layers Parameter Discussion
By
–
yeah, config.json + modeling looks pretty much the same. Still no reference of “num_nextn_predict_layers” in modeling
-
DeepSeek Instruct Open Weight LLM Released on Hub
By
–
Holy fuck! They also dropped the Instruct model on the Hub – that’s literally the same model that runs on DeepSeek Chat! That’s the best open weight LLM right now and second best on AiderBench (after o1) Now we wait for the model card!