Multimodal LLMs can write essays.
They can chat, caption, and even summarize papers. But give them a Portal map or a 3D puzzle, they break instantly. Zero percent accuracy. Welcome to the edge of AI reasoning: The MARBLE Benchmark
Multimodal LLMs Fail 3D Puzzles: MARBLE Benchmark
By
–
