Facebook added it too: developers.facebook.com/llms…
→ View original post on X — @jeremyphoward, 2026-03-31 02:37 UTC
Global AI News Aggregator
By
–
Facebook added it too: developers.facebook.com/llms…
→ View original post on X — @jeremyphoward, 2026-03-31 02:37 UTC

By
–
Okay LLM + PyTorch people, trunc_normal_, what the fuck! Many LLM inits use it w/ default cutoffs. It's either not doing anything or it's quite broken due 2 issues. 1. The a/b cutoffs in PyTorch are not in std-devs, they are absolute. So w/ a std=0.02, and -2/2 (default arg) cutoffs that's 100σ!! That is a normal distribution, trun isn't doing anything. 2. There are numerical issues. Even in float32, the truncation produces a handful of -2 (lower cutoff) values, 100σ!! That's incomprehensibly improbable. I doubt a float32 or even float64 algo could even produce it, but clamping a bad float value does. Olmo (@allenai codebases) appear to be one of the few that uses trunc_normal_ and bothered to set the cutoffs properly. It'd be nice to see more train code opened up as a default. We so often only end up with a sanitized version of the inference/fine-tune friendly model these days and may lose details like original init. I've known about #1 for ages, I have an alternate trunc_normal_tf_ implementation in timm for that reason. But I saw those -2's last week when I was debugging something and was a little surprised.
→ View original post on X — @jeremyphoward, 2026-03-30 15:09 UTC

By
–
Alec is a once in a generational researcher, but saying that he invented pretraining is not only a bit of stretch, but it's also a disrespect to other people's work. Flowers ☾ (@flowersslop) Every LLM from any lab today traces back to this guy, who was the only person at OpenAI pushing for pretraining transformer language models. He built GPT-1. After that did others see the potential. He invented it, and almost none of the so called AI experts even know his name. — https://nitter.net/flowersslop/status/2037892926785634720#m
→ View original post on X — @jeremyphoward, 2026-03-29 12:21 UTC

By
–
Every LLM from any lab today, including from OpenAI taces back to @jeremyphoward and @seb_ruder with their ULMFiT paper. The breakthrough for LLMs was transfer learning, not Attention. Flowers ☾ (@flowersslop) Every LLM from any lab today traces back to this guy, who was the only person at OpenAI pushing for pretraining transformer language models. He built GPT-1. After that did others see the potential. He invented it, and almost none of the so called AI experts even know his name. — https://nitter.net/flowersslop/status/2037892926785634720#m
→ View original post on X — @jeremyphoward, 2026-03-28 20:34 UTC
By
–
we as software engineers are becoming beholden to a handful of well funded corportations. while they are our "friends" now, that may change due to incentives. i'm very uncomfortable with that. i believe we need to band together as a community and create a public, free to use repository of real-world (coding) agent sessions/traces. I want small labs, startups, and tinkerers to have access to the same data the big folks currently gobble up from all of us. So we, as a community, can do what e.g. Cursor does below, and take back a little bit of control again. Who's with me? cursor.com/blog/real-time-rl…
→ View original post on X — @jeremyphoward, 2026-03-28 08:38 UTC

By
–

"naming their next model after Cthulhu" 😒 — "Naming their next model after Cthulhu makes it hard to take Anthropic seriously as the good guys. It's fun at any other software company, not one that actually is flirting with extinction." — Eliezer Yudkowsky (@allTheYud) [Translated from EN to English]
→ View original post on X — @jeremyphoward, 2026-03-27 18:36 UTC
By
–
130 lines instead of 870. That's the difference between our conv2d implementation on Blackwell and CUTLASS's. We broke kernels into three swappable pieces: one for moving data, one for coordinating the pipeline, one for compute. When you need a new kernel, you only change the piece that actually needs to change. Part 3 of our Structured Mojo Kernels series walks through the details: modular.com/blog/structured-…
→ View original post on X — @jeremyphoward, 2026-03-27 15:00 UTC

By
–
Mac Mini + eGPU. Both NVIDIA and AMD supported.
→ View original post on X — @jeremyphoward, 2026-03-14 04:16 UTC