8/ Hierarchical Vision Transformer – pretrains vision transformers with a visual pretext task (MAE), while removing unnecessary components; enables a simple hierarchical vision transformer that’s more accurate and faster at inference and during training.
Hierarchical Vision Transformer Improves Efficiency Training
By
–
