AI Dynamics

Global AI News Aggregator

COMPUTING

Day 83 GPU Programming: DeepSeek Multi-Head Latent Attention Optimization

By

@rasbt

–

27 March 2026 23h49

Day 83/365 of GPU Programming Looking at DeepSeek's Multi-Head Latent Attention today. The last part of the AMD challenge series is to optimize an MLA decode kernel for MI355X where the absorbed Q and compressed KV cache are given and your task is to do the attention computation. A resource that really helped internalize what MLA does was @rasbt's incredible visual guide to attention variants in LLMs (luckily he posted that last week!), which covers everything from MHA to GQA to MLA to SWA, et cetera. If there's one place to get a visual intuition for recent attention mechanisms, it's this blog post. @jbhuang0604's video on MQA, GQA,MLA and DSA was the best conceptual intro I found on the topic and progressively builds up the ideas from first principles. The Welch Labs analysis of MLA is a great watch as well. Beautiful visualization of the changes DeepSeek made for MLA. Tried out a few kernels once I had a basic understanding of MLA and I think I'm slowly getting more comfortable with at least analyzing kernels. levi (@levidiamode) Day 82/365 of GPU Programming Taking a closer look at Mixture of Experts today, so I can write better MoE kernels. Specifically, to optimize an MXFP4 MoE fused kernel for the GPU Mode challenge. I haven't had much prior exposure to MoEs, so lots of new concepts I learned today. Luckily I found the best intro to MoEs thanks to @MaartenGr visual overview of the topic. I then watched @tatsu_hashimoto's amazing Stanford CS336 lecture on MoEs, which added deeper context around why MoEs are gaining popularity, FLOPs, OLMoE, infra complexity, routing functions (mindblown this works so well…), expert sizes, training objectives, top k routing and DeepSeek variations. Once I had a basic understanding I started playing around with the some AITER kernels but progress there is tbd. Also had a nice chat with @juscallmevyom (who was kind enough to reach out!) about the AMD kernels and the challenge of materialization overhead. — https://nitter.net/levidiamode/status/2037297869518950430#m

→ View original post on X — @rasbt, 2026-03-27 22:49 UTC

27 March 2026
Why I Shill Droid 24/7: Eight Major Accomplishments Today

By

@nathanlands

–

27 March 2026 20h36

Here's why I shill Droid 24/7 ———- Today Droid single-handedly: 1. Published a REAP of GLM-5 in FP8, there's a reason no one else has done it DSA is still very new: huggingface.co/0xSero/GLM-5-… 2. Found and Fixed an upstream issue with VLLM + DSA + Hopper where GLM-5's kv-cache would need to recompute and spend 20x the time needed, fixed. 3. Created multiple working quantisations on it's own, it tried exl3 and autoround but both failed so resorted to GGUF (autoround 3 bits doesn't work on ampere) huggingface.co/0xSero/GLM-5-… 4. Implemented github.com/0xSero/turboquant within 24 hours of the research paper coming out, tested it across 5090s, 3090s, H100s, and B200s 5. Has been distilling larger models into LoRA to help me test arxiv.org/abs/2505.21835 and it got an 80% prune to be semi-coherent again. 6. Helped my find research papers, clean up slop with the human-writing skill. 7. Got BYOK working with Anthropic, ZAI, Kimi, MiniMax, OpenAI working in Cursor github.com/0xSero/factory-cu… 8. Helped me Implement blog.comfy.org/p/dynamic-vra… 's dynamic loading, only works on a tiny model, but still. ——- I only have to check in on it every 30-45 minutes (I am talking all 8 of my sessions) the thing will run for 16 hours with like 0 prep All this while I am mostly focused on my actual job and tweeting 24/7 Keep in mind each one of these experiments is running on a different server, with different constraints, like I don't understand how I can get such good results here. ——— I love novelty. Which is why I jump around and talking about all these different tools. I have used all of these harnesses and messed around with every feature. I keep coming back to this, and I keep shilling it because I sincerely wish others get to experience this.

→ View original post on X — @nathanlands, 2026-03-27 19:36 UTC

27 March 2026
MiniMax AI 2.5 Cloud Release: Developers Share First Test Ideas

By

@sambanovaai

–

27 March 2026 18h00

Since @MiniMax_AI 2.5 is available on our cloud —devs, we have a question for you. What’s the first thing you’d test with the upgraded algorithm?

→ View original post on X — @sambanovaai,

27 March 2026
T-Mobile Network Planning for Major Events and Emergency Communications

By

AI Dynamics

–

27 March 2026 17h32

Behind every major event is long-term network planning. Automation and AI-powered optimization help manage changing demand while strengthening everyday connectivity across key venues and transit hubs. More: https://
t-mobile.com/news/network/t
-mobile-bay-area-emergency-communications-big-game
… @TMobileBuiness Partner

→ View original post on X — @haroldsinnott,

27 March 2026
Multimodal AI: The Future of Human-Computer Interaction According to Stanford

By

@stanfordhai

–

27 March 2026 17h05

At @NVIDIAGTC, @StanfordHAI's James @Landay said we're amid a major shift in human-computer interaction. Current text/voice AI is "just a blip" and he envisions a future of multimodal agents that anticipate user needs through voice, gesture, and context: nvidia.com/en-us/on-demand/s… [Translated from EN to English]

→ View original post on X — @stanfordhai, 2026-03-27 16:05 UTC

27 March 2026
Mojo Kernels: Reducing conv2d Code from 870 to 130 Lines

By

@jeremyphoward

–

27 March 2026 16h00

130 lines instead of 870. That's the difference between our conv2d implementation on Blackwell and CUTLASS's. We broke kernels into three swappable pieces: one for moving data, one for coordinating the pipeline, one for compute. When you need a new kernel, you only change the piece that actually needs to change. Part 3 of our Structured Mojo Kernels series walks through the details: modular.com/blog/structured-…

→ View original post on X — @jeremyphoward, 2026-03-27 15:00 UTC

27 March 2026
SambaNova RDU: Dataflow Architecture for Efficient AI Processing

By

@sambanovaai

–

27 March 2026 16h00

Want faster, more efficient AI? It starts with dataflow—the natural way AI models process data. Our RDU is built for exactly that. See how it works: https://
sambanova.ai/products/dataf
low-architecture?utm_source=x&utm_medium=organic&utm_campaign=developer
…

→ View original post on X — @sambanovaai,

27 March 2026
Tenstorrent Unveils New Cluster with 1TB VRAM and 3TB DDR5

By

@tenstorrent

–

27 March 2026 15h40

New Tenstorrent cluster hot from the kitchen > 1TB of VRAM > 3TB DDR5 RAM > 32TB SSD Storage New product, will share more later P.S. Can you find the cat in the picture?

→ View original post on X — @tenstorrent, 2026-03-27 14:40 UTC

27 March 2026
Living Brain Cells Play DOOM: Cortical Labs Advances Neuromorphic Computing

By

@ronald_vanloon

–

27 March 2026 14h50

Living Brain Cells Play DOOM: Cortical Labs Pushes Neuromorphic Computing Forward
by @IntEngineering #Innovation #EmergingTech #Technology #Tech pic.twitter.com/NcFT4XTajr
— Ronald van Loon (@Ronald_vanLoon) 27 mars 2026

Living Brain Cells Play DOOM: Cortical Labs Pushes Neuromorphic Computing Forward
by @IntEngineering #Innovation #EmergingTech #Technology #Tech

→ View original post on X — @ronald_vanloon,

27 March 2026
Insurers Leverage Cloud Computing for Efficiency and Flexibility

By

@sassoftware

–

27 March 2026 14h00

Modern insurance runs in the cloud. Outsourcing cloud-computing storage allows insurers like Stuttgarter Lebensversicherung a.G. to benefit from flexibility & efficiency of the cloud without placing strain on limited internal IT resources. http://
2.sas.com/6011B6nkDB

→ View original post on X — @sassoftware,

27 March 2026