Similar to how text-based LLMs were made possible via large pre-trained text transformers, the progress on large pre-trained vision transformers has been swift. This includes Meta's fantastic work (XCiT, DINO, DINOv2, SAM), Landing AI's work on Visual Prompting, and the work of
Vision Transformers Progress: Meta and Landing AI Advances
By
–