Please note, we're not able to reproduce the 41.8% ARC-AGI-1 score claimed by the latest Qwen 3 release — neither on the public eval set nor on the semi-private set. The numbers we're seeing are in line with other recent base models. In general, only rely on scores verified by