ARC Grids as Token Sequences: Why VLMs Struggle with Processing

AI Dynamics

Global AI News Aggregator

ARC Grids as Token Sequences: Why VLMs Struggle with Processing

–

25 December 2024 18h41

Fundamentally it's because ARC grids aren't images and thus VLMs can't make sense of them. They're 2D grids of tokens. Some people use 2D native transformers to process them, with good results (2D position embedding, or 2D attention), but a flattened sequence is actually a very

→ View original post on X — @fchollet,

25 December 2024

AI CODE INNOVATION LLMS MACHINE LEARNING MULTIMODAL AI RESEARCH

AI Dynamics

ARC Grids as Token Sequences: Why VLMs Struggle with Processing

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

AI Generates Perfect Jokes Using Image Generation Skills

Codex App Transformation: Atlas Integration Reshapes User Experience

AI File Access Limitations: Screenshot vs Disk Storage Issues

Synthetic Aperture Radar: Satellite Tech for Global Monitoring