Can GRPO work for diffusion LLMs (dLLMs)?
A team from UCLA and Meta says yes—and they’ve got the receipts.
They introduce d1, a framework that adapts pre-trained masked dLLMs into reasoning models via SFT + RL.
The twist? Their RL method is a fresh one: diffu-GRPO.
GRPO Adapted for Diffusion LLMs: UCLA and Meta’s diffu-GRPO
By
–
