Tarsier2: Advancing Large Vision-Language Models from Detailed Description to Comprehensive Understanding Tarsier2 is a state-of-the-art large vision-language model (LVLM) that excels in video description and understanding, improved by scaling pre-training data,
Tarsier2: Advanced Video Understanding with Large Vision-Language Models
By
–
