InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation paper page: https://
huggingface.co/papers/2307.06
942
… introduces InternVid, a large-scale video-centric multimodal dataset that enables learning powerful and transferable video-text representations for
InternVid: Large-Scale Video-Text Dataset for Multimodal Learning
By
–
