We're not evaluating "models" (whatever that means), we're evaluating arbitrary AI systems. They can include whatever harnesses or tools they want. But they cannot have been handcrafted or trained for ARC-AGI-3 specifically, because then we wouldn't testing AGI, we would be
ARC-AGI-3 Evaluation Criteria For Arbitrary AI Systems
By
–
Leave a Reply