Augmenting CLIP with Improved Visio-Linguistic Reasoning paper page: https://
huggingface.co/papers/2307.09
233
… Image-text contrastive models such as CLIP are useful for a variety of downstream applications including zero-shot classification, image-text retrieval and transfer learning. However,
Augmenting CLIP with Improved Visio-Linguistic Reasoning
By
–
