1/4 We hit a strange logprob mismatch while training Jamba 3B with GRPO. Rollout logprobs and training-side recompute should match before any weight update. Ours didn't. That was the canary. ๐งต [Translated from EN to English]
โ View original post on X โ @ai21labs, 2026-03-26 11:42 UTC

Leave a Reply