S-Tier Chinese Labs: Moonshot and DeepSeek These 2 are levels above everyone else
@theahmadosman
-

Consulting for migrating off Claude Code to on-premise LLMs
By
–
If you’re business / enterprise is trying to migrate off Claude Code / OpenAI and want to host your LLMs on-premise I do consulting for that kind of stuff btw Bonus: you’ll be setup with a path forward to training models on your tasks / workflows and save so much $$$ long term
-

Full-load GPU limited to 220W with DFlash, DDTree optimizations
By
–
Looks like this under full-load btw Lots of juice to squeeze yet with DFlash / DDTree / Spec. Decoding / etc Also, power limiting the GPUs to 220w down from 440w as well (okay w/ leaving the perf. loss on the table given the heat / energy savings from that)
-

14 RTX 3090s running 42 parallel agents at full 256k context
By
–
14x RTX 3090s + Qwen 3.6 27B Running 42 agents IN PARALLEL at full 256k context – exl3 6bpw
– fp8 KV Cache
– Aphrodite Inference Engine w/ tp=2, pp=7 The world of agents will run locally btw -
Infrastructure is the final boss of AI
By
–
Everything in AI is an infra problem btw Like that’s always gonna be the final boss
-
Kimi K2.6 remains the top open-source SoTA model
By
–
Kimi K2.6 is still the current opensource SoTA model
-
The Trend Towards Small and Specialized AI Models
By
–
All roads lead to small and specialized models btw
-
Optimizing Unified Memory with Speculative Decoding on Apple Hardware
By
–
You’re optimizing the right things for Unified Memory There are some shortcomings to Speculative Decoding (especially in more creative / less coding workflows) but the pros far outweigh the cons giving the Unified Memory limitations Gonna try this on my Apple hardware tonight
-
Summer of wild innovations for open-source AI
By
–
This will be the summer where we make crazy wild things happen to opensource AI
