Please note, we're not able to reproduce the 41.8% ARC-AGI-1 score claimed by the latest Qwen 3 release — neither on the public eval set nor on the semi-private set. The numbers we're seeing are in line with other recent base models. In general, only rely on scores verified by
Qwen 3 ARC-AGI score cannot be reproduced independently
By
–
Leave a Reply