MMaDA: Multimodal Large Diffusion Language Models MMaDA introduces a new class of multimodal diffusion models that unify reasoning and generation across text and vision, achieving state-of-the-art performance in both understanding and generation tasks. Problem: Existing
MMaDA: Multimodal Diffusion Language Models for Reasoning
By
–
