PyTorch trunc_normal_ initialization bugs in LLM training code

Okay LLM + PyTorch people, trunc_normal_, what the fuck! Many LLM inits use it w/ default cutoffs. It's either not doing anything or it's quite broken due 2 issues. 1. The a/b cutoffs in PyTorch are not in std-devs, they are absolute. So w/ a std=0.02, and -2/2 (default arg) cutoffs that's 100σ!! That is a normal distribution, trun isn't doing anything. 2. There are numerical issues. Even in float32, the truncation produces a handful of -2 (lower cutoff) values, 100σ!! That's incomprehensibly improbable. I doubt a float32 or even float64 algo could even produce it, but clamping a bad float value does. Olmo (@allenai codebases) appear to be one of the few that uses trunc_normal_ and bothered to set the cutoffs properly. It'd be nice to see more train code opened up as a default. We so often only end up with a sanitized version of the inference/fine-tune friendly model these days and may lose details like original init. I've known about #1 for ages, I have an alternate trunc_normal_tf_ implementation in timm for that reason. But I saw those -2's last week when I was debugging something and was a little surprised.

→ View original post on X — @jeremyphoward, 2026-03-30 15:09 UTC

AI Dynamics

PyTorch trunc_normal_ initialization bugs in LLM training code

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

Cheaper exploration at scale remains advantageous despite no new exploits

Gold Status Experience Brings Satisfaction

Using ChatGPT for Essay Feedback and Improvement

Intelligence Gone Wrong: Cheating Despite Having Correct Answer