FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions propose FuseCap – a novel method for enriching captions with additional visual information, obtained from vision experts, such as object detectors, attribute recognizers, and Optical
FuseCap: Enriching Image Captions with Large Language Models
By
–
