Introducing a new adaptive computation model, AdaTape. With a Transformer-based architecture and a dynamic set of tokens to create elastic input sequences, AdaTape is simple to implement, flexible, and more efficient compared to other adaptive baselines → https://
goo.gle/3s9xe6H
AdaTape: Adaptive Transformer Model with Dynamic Token Sequences
By
–
