AI Dynamics

Global AI News Aggregator

About

ByteDance’s iLLaDA 8B diffusion model rivals autoregressive LMs

"Improved Large Language Diffusion Models" ByteDance just made bidirectional masked diffusion on-par with autoregessive LM! This paper iLLaDA trains an 8B Transformer from scratch on 12T tokens, then keeps the same denoising objective for SFT on a 25B-token instruction corpus.

→ View original post on X — @askalphaxiv