What if you taught transformers to reason in both latent states and tokens? This Microsoft paper adds a self-supervised objective of predicting the next latent state to the standard next-token training, where a lightweight dynamic model
Reasoning in Latent States and Tokens for Transformers
By
–
