AI Dynamics

Global AI News Aggregator

AutoGaze: Efficient Video Understanding with Selective Attention

Humans can see in high-res, high-FPS in real-time. Why can't VLMs? Introducing AutoGaze: ViTs/VLMs "gaze" only at key video regions! Up to 4-100x token savings, 19x speedup, and enables scaling to 4K-res 1K-frame videos. ๐Ÿ“„ arxiv.org/abs/2603.12254 ๐ŸŒ autogaze.github.io ๐Ÿค— huggingface.co/collections/bโ€ฆ (1/n)๐Ÿงต

โ†’ View original post on X โ€” @berkeley_ai, 2026-03-24 18:46 UTC

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *