A2D-VL outperforms prior diffusion VLMs in visual question-answering while requiring significantly less training compute. Our novel adaptation techniques are critical for retaining model capabilities, finally enabling the conversion of state-of-the-art autoregressive VLMs to
A2D-VL Diffusion Model Outperforms VLMs with Efficient Training
By
–
Leave a Reply