Generally speaking, diffusion LLMs (dLLMs) are faster than autoregressive LLMs (AR LLMs), but dLLMs can be made even faster. Nvidia researchers have proposed a Fast-dLLM. By enabling KV caching and parallel decoding, Fast-dLLM achieves training-free acceleration. Two Key
Fast-dLLM: Nvidia Accelerates Diffusion Language Models
By
–
