That’s a good point. A benchmark focused on multiple-choice identification of simple shapes or single letters would probably be a better starting point than what’s in BIG-bench.
Multiple-choice shape/letter identification benchmark as better start
By
–