it is increasingly likely that the next generation of GPTs will not be vanilla autoregressive models, nor text diffusion, but some third hybrid thing if you believe this, read subham's research. this paper makes one step in that direction (making diffusion work with kv cache)
Next Generation GPTs: Hybrid Models Beyond Autoregressive Approaches
By
–
Leave a Reply