They are releasing a combined text-audio-vision model that processes all three modalities in one single neural network, which can then do real-time voice translation as a special case afterthought, if you ask it to.
— Andrej Karpathy (@karpathy) 13 mai 2024
(fixed it for you) https://t.co/0y36OId88h
They are releasing a combined text-audio-vision model that processes all three modalities in one single neural network, which can then do real-time voice translation as a special case afterthought, if you ask it to. (fixed it for you)
Leave a Reply