AI Dynamics

Global AI News Aggregator

ARC Grids as Token Sequences: Why VLMs Struggle with Processing

Fundamentally it's because ARC grids aren't images and thus VLMs can't make sense of them. They're 2D grids of tokens. Some people use 2D native transformers to process them, with good results (2D position embedding, or 2D attention), but a flattened sequence is actually a very

→ View original post on X — @fchollet,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *