I don’t think you can learn to understand the visual world from text alone. but congenitally blind people learn a ton about the world without visual input, with much richer representations than I would ascribe to LLMs. Landau & Gleitman had a terrific book about this in the
Can LLMs Learn Visual Understanding Without Images?
By
–