Thus, when an image includes two concepts (e.g. a lemon and an eggplant) while the text prompt only mentions one concept (e.g. lemon), CLIP attempts to account for the unmentioned concept (like the eggplant) by saying 'purple', a color commonly associated with eggplants.
3/5
CLIP’s Strategy for Handling Unmentioned Visual Concepts in Images
By
–
Leave a Reply