FACT: If you don't train your 350M model on 28T tokens, you're not optimal Nicholas Roberts (@nick11roberts) That new LFM2.5-350M is super overtrained, right? And everyone was shocked about how far they pushed it? As it turns out, we have a brand new scaling law for that! 🧵 [1/n] — https://nitter.net/nick11roberts/status/2041141606305124486#m
→ View original post on X — @maximelabonne, 2026-04-06 15:05 UTC
