Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution https://
oryx-mllm.github.io https://
arxiv.org/abs/2409.12961
The work proposes Oryx, a unified multimodal architecture for the spatial-temporal understanding of images, videos, and multi-view 3D scenes.
Oryx MLLM: Spatial-Temporal Understanding at Arbitrary Resolution
By
–