This is a great illustration of how hard it is to figure out how good AI systems are via testing. GPT-4V solves the famous dog versus muffin image problem… except it doesn’t if the image is changed so it isn’t in the training data… except it does, but only some of the time.
Testing AI Systems: Evaluating GPT-4V Vision Capabilities
By
–
Leave a Reply