Midtraining is a new part of many training pipelines, but when does it help and can it backfire? π€ In our new preprint, we use controlled experiments to pin this down. TL;DR; midtraining helps the most when it βbridgesβ pretraining and posttraining, and mitigates forgetting after posttraining. Timing is also very important. π§΅
β View original post on X β @jeande_d, 2026-02-24 18:12 UTC

Leave a Reply