How well can multimodal LLMs understand long-distance travel videos? Enter VIR-Bench, a new benchmark with 200 real-world travel videos that challenges models to reconstruct itineraries and reason over extended geospatial-temporal trajectories. Why it matters: mastering
VIR-Bench: Evaluating Multimodal LLMs on Travel Video Understanding
By
–
Leave a Reply