Good points from @ylecun albeit it may just be a matter of time before we have foundation AI LLM models trained on + taking inputs on a multimodal basis (including video and text). However, logic and reasoning whilst improving with Chain of Thought may still take longer to
Multimodal LLMs and reasoning advancement in foundation models
By
–