It's a big macro trend shift as they revert back to old dense models, while still comparable or better than MoE baselines. "Leak" because the big Western companies figured it out a while ago, letting the OSS community & China go down the MoE path — notably more complex + ~worse
Dense Models Beat MoE: Western AI Companies Strategic Shift
By
–