Official verification of Qwen3-235b Instruct: it gets 11% on ARC-AGI-1 and 1.3% on ARC-AGI-2 (semi-private sets). These numbers are in line with other SotA base models. Qwen3 stands out by being the cheapest base model we tested to score above 10% on ARC-AGI-1.
Qwen3-235b Instruct Verified: 11% ARC-AGI-1 Performance, Most Cost-Effective
By
–
Leave a Reply