Most “multimodal” AI is just duct tape.
— Ronald van Loon (@Ronald_vanLoon) 24 mars 2026
One model for text.
Another for vision.
A pipeline to glue it together.
But what happens when text + vision are native in the same model, with a 256K context window?
That is what Moonshot AI just changed.
A thread on what I learned from… pic.twitter.com/CmnuE9FNJx
Most “multimodal” AI is just duct tape. One model for text.
Another for vision.
A pipeline to glue it together. But what happens when text + vision are native in the same model, with a 256K context window? That is what Moonshot AI just changed. A thread on what I learned from
Leave a Reply