Training Kimi K2 and Qwen3 30B-scale models efficiently requires more than standard data-parallel tricks. NVIDIA Megatron Core now provides end-to-end support for emerging higher-order optimizers like Muon, alongside research optimizers such as MOP and REKLS, to push training
NVIDIA Megatron Core Adds Muon and Advanced Optimizers for LLM Training
By
–
Leave a Reply