Everyone is working on multimodal systems.
The question is how to do it.
And the problem is that the kind of generative architecture that works for text does not work for images and video.
Multimodal AI: Architectural Challenges Beyond Text Generation
By
–
Leave a Reply