“Thinking with Visual Primitives” New paper from DeepSeek… But got taken down? Most multimodal models can look at an image, but they still mostly reason in language. So on tasks like counting, spatial reasoning, and mazes, words alone are a weak way to keep track of visual
DeepSeek Paper Explores Visual Primitives for Multimodal Reasoning
By
–
