Can AI truly see and hear together? Researchers from NUS, Oxford, University of Toronto, and Microsoft Research present a comprehensive survey on Audio-Visual Intelligence. They unify the fragmented field of large foundation models that combine sound and vision—from speech.
