VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores paper page: https://
huggingface.co/papers/2306.01
879
… Vision-language models (VLMs) discriminatively pre-trained with contrastive image-text matching losses such as P(match|text, image) have been criticized
VisualGPTScore: Vision-Language Models with Multimodal Generative Pre-Training
By
–
