AI Dynamics

Global AI News Aggregator

CLIP and LLaVA Struggle with Contextual Image Interpretation

In this example, we believe CLIP sees the "surprised cat pose" and predicts doubt, surprise and fear, ignoring context. Oddly, LLaVA has also gone a bit far, inferring the person in this image is experiencing sadness because it's their last ski trip of the season.

→ View original post on X — @petitegeek,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *