@theahmadosman

–

03 June 2026 19h45

S-Tier Chinese Labs: Moonshot and DeepSeek These 2 are levels above everyone else

Consulting for migrating off Claude Code to on-premise LLMs

By

–

03 June 2026 18h14

If you’re business / enterprise is trying to migrate off Claude Code / OpenAI and want to host your LLMs on-premise I do consulting for that kind of stuff btw Bonus: you’ll be setup with a path forward to training models on your tasks / workflows and save so much $$$ long term

Full-load GPU limited to 220W with DFlash, DDTree optimizations

By

–

03 June 2026 5h58

Looks like this under full-load btw Lots of juice to squeeze yet with DFlash / DDTree / Spec. Decoding / etc Also, power limiting the GPUs to 220w down from 440w as well (okay w/ leaving the perf. loss on the table given the heat / energy savings from that)

14 RTX 3090s running 42 parallel agents at full 256k context

By

–

03 June 2026 4h52

14x RTX 3090s + Qwen 3.6 27B Running 42 agents IN PARALLEL at full 256k context – exl3 6bpw
– fp8 KV Cache
– Aphrodite Inference Engine w/ tp=2, pp=7 The world of agents will run locally btw

Nostalgic message to Gemini 2.5 Pro

By

–

02 June 2026 15h39

I still think of you, Gemini 2.5 Pro 03-25

Infrastructure is the final boss of AI

By

–

02 June 2026 11h30

Everything in AI is an infra problem btw Like that’s always gonna be the final boss

Kimi K2.6 remains the top open-source SoTA model

By

–

02 June 2026 8h26

Kimi K2.6 is still the current opensource SoTA model

The Trend Towards Small and Specialized AI Models

By

–

02 June 2026 0h54

All roads lead to small and specialized models btw

Optimizing Unified Memory with Speculative Decoding on Apple Hardware

By

–

01 June 2026 23h59

You’re optimizing the right things for Unified Memory There are some shortcomings to Speculative Decoding (especially in more creative / less coding workflows) but the pros far outweigh the cons giving the Unified Memory limitations Gonna try this on my Apple hardware tonight