This may seem surprising to many people — if LLMs can score above human level on all those hard human exam benchmarks, why couldn't they do something as simple as ARC tasks, most of which seem trivially obvious to humans? It's because these tasks are mostly *new* — you won't
Why LLMs Fail at ARC Tasks Despite Excelling on Exams
By
–
Leave a Reply