FACT: If you don't train your 350M model on 28T tokens, you're not optimal Nicholas Roberts (@nick11roberts) That new LFM2.5-350M is super overtrained, right? And everyone was shocked about how far they pushed it? As it turns out, we have a brand new scaling law for that! ๐งต [1/n] โ https://nitter.net/nick11roberts/status/2041141606305124486#m
โ View original post on X โ @maximelabonne, 2026-04-06 15:05 UTC

Leave a Reply