LLM Decoding Simplified From the upcoming article on X
@theahmadosman
-
LLM inference: mostly about avoiding unnecessary movements
By
–
LLM inference is mostly about avoiding unnecessary movements btw
-
Elon asked to promote public LLMs 101 course
By
–
Elon, how about spreading the word around this to help me create a quality LLMs 101 course available to the public? We need more smart people in the field and this is an excellent way to do that
-
Dreaming of free LLM course inspired by CS50
By
–
I wanna teach a course on LLMs 101 in an educational institution + have it recorded and available online to the public for free I still have an email thanking David J. Malan when I was 13 for putting CS 50 online for free for me to watch it in Egypt Who knows maybe it’ll happen
-
Power limiting reduces inference performance by 10% but saves 50% electricity
By
–
Power limited* not throttled Performance loss is negligible, around 10% in inference but I save 50% in electricity
-

Kernels Are the Actual Work in Model Inference
By
–
You don’t “run a model”
You run Kernels The model is just a graph The Inference Engine is scheduler / optimizer / executor But the actual work? That happens in the Kernels – MatMul Kernels
– Attention Kernels
– RMSNorm Kernels
– KV cache Kernels
– Quantized linear Kernels
– -

Google releases Gemma 4 QAT 4bit with 31B Dense and 26B MoE
By
–
Great news Google just released the QAT (4bit) of their Gemma 4 model series including the 31B Dense and the 26B MoE Another W for Opensource AI this week
-
Nemotron 3 Ultra release brings hope for open-source AI
By
–
Finally, today's Nemotron 3 Ultra release makes me very hopeful for the future of Opensource AI
— Ahmad (@TheAhmadOsman) 5 juin 2026
Jensen knows that this is important to keep the powers in check, and I believe he's sincere in his answer to me that there will be continuity to the Nemotron Coalition releases
Big W https://t.co/QTjZGR63dOFinally, today's Nemotron 3 Ultra release makes me very hopeful for the future of Opensource AI Jensen knows that this is important to keep the powers in check, and I believe he's sincere in his answer to me that there will be continuity to the Nemotron Coalition releases Big W
-
Start Local AI with a single RTX 3090, buy it anon
By
–
All it takes to get started with Local AI is a single RTX 3090, so go buy that GPU anon
-

Improving Codex Cli with multi-agent delegation and memory
By
–
Let me make your Codex Cli experience better with – Multi-agent delegation
– Enhanced memory
– Better artifacts
– Children AGENTS.md contextualization
– Runtime metrics (optional) Run the command in the screenshot below to get them up and running Let me know how you like it
