4/ Vision Transformers Need Registers – identifies artifacts in feature maps of vision transformer networks that are repurposed for internal computations; the proposed solutions provide additional tokens to the input sequence to fill that role.
Vision Transformers Need Registers for Internal Computations
By
–
