What if you could scale up a ViT in minutes instead of training a giant one from scratch? ScaleNet makes it possible by inserting weight shared layers into pretrained ViTs with tiny adapter adjustments. It expands depth with almost no extra parameters, boosts a 2x depth DeiT
ScaleNet: Scaling Vision Transformers with Weight Sharing
By
–
Leave a Reply