AI Dynamics

Global AI News Aggregator

OpenAI releases multimodal model with text audio vision capabilities

They are releasing a combined text-audio-vision model that processes all three modalities in one single neural network, which can then do real-time voice translation as a special case afterthought, if you ask it to. (fixed it for you)

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *