This is interesting as a first large diffusion-based LLM.
— Andrej Karpathy (@karpathy) 27 février 2025
Most of the LLMs you've been seeing are ~clones as far as the core modeling approach goes. They're all trained "autoregressively", i.e. predicting tokens from left to right. Diffusion is different – it doesn't go left to… https://t.co/I0gnRKkh9k
This is interesting as a first large diffusion-based LLM. Most of the LLMs you've been seeing are ~clones as far as the core modeling approach goes. They're all trained "autoregressively", i.e. predicting tokens from left to right. Diffusion is different – it doesn't go left to
Leave a Reply