Images are hundreds of thousands of pixels, so nobody can afford an architecture that runs a large model over every pixel. The resulting small models are in fact much stupider than GPT-4 and have trouble following even slightly complicated instructions.
Image Models Limitations: Computational Constraints and Reduced Capability
By
–
Leave a Reply