And this is what a typical LM does for captioning the same image. It’s not perfect but it is more scalable and detailed. With some LLM eval filtering would look even better.
Language Models Image Captioning with LLM Evaluation Filtering
By
–

By
–

And this is what a typical LM does for captioning the same image. It’s not perfect but it is more scalable and detailed. With some LLM eval filtering would look even better.