HUGE a new Agentic coding model, fits on 4x RTX 3090s @ 4-bit, fully local KAT-Dev-72B-Exp by Kwaipilot – Claude Code setup guide included – ranks #2 on SWE-Bench Verified – excels at long-horizon coding + tool-use – multi-stage tuned: Mid-Training, SFT + RFT, Agentic RL
@theahmadosman
-
LLM Infrastructure: Still Early, Much Work Ahead
By
–
LLM infra right now is like Linux in the 90s we're still early & there is a lot of work to do
-
Build Autograd Engine, Mini-GPT, and LoRA Fine-tuning From Scratch
By
–
– build an autograd engine from scratch
– write a mini-GPT from scratch
– implement LoRA and fine-tune a model on real data
– hate CUDA at least once
– cry
– keep going the roadmap – 5 phases
– if you already know something? skip
– if you're lost? rewatch
– if you’re stuck? use -
GLM-4.5 Air Local Deployment for Sensitive Data Projects
By
–
yeah i know, i am using glm-4.5 air locally for some sensitive data project otherwise that plan is great for its price they're also dropping 4.6 air soon
-
Ahmad Osman Endorses Open Superintelligence Stack Initiative
By
–
ahmad osman here, co-signing this with every GPU i own open superintelligence stack or bust i approve this message
-
Home AI Server Building Best Practices Guide
By
–
the basic rules i follow when building an AI server at home >direct lanes, x16 or x8, from CPU and never off chipset
>no risers unless absolutely necessary
>airflow must be front-to-back, no hot recirculation
>power budget for transient spikes, not just average draw
>always -
RTX 5090 vs 4x 3090s VRAM comparison for LLM inference
By
–
5090 has 32GB of VRAM, 4x 3090s have 96GB of VRAM when it comes to LLMs inference, we care more about memory as models are better fully offloaded into VRAM than being shared across system RAM and a single RTX 5090's VRAM
-
Economic Insecurity in the Age of AI Acceleration
By
–
Nobody is safe in this economy A C C E L E R A T E
-
Ollama’s bloated wrapper fails to match ggml’s efficiency
By
–
do not use Ollama ggerganov wrote blazing-fast
C++ inference (ggml, llama.cpp) then Ollama wrapped it
in a bloated binary and is now somehow the face of local LLMs
soaking up VC hype and it's not even a good wrapper lol