Gemma4 is amazing. You'll read that everywhere. Let's focus on what is HUGE here: the revenge of dense models…. Throw away your b200, not needed anymore, throw away the millions of lines of code we had to write to make MOEs faster, training stable etc… throw away your router-aware kernel, your EP DEEP GEMM, throw away the auxiliary loss function. Welcome to simplicity, dense is the new king. FINALLY hating MoEs is back to being chad. For those who know me: I was always a moe doomer
→ View original post on X — @jeremyphoward, 2026-04-02 16:23 UTC

Leave a Reply