But how well could multimodal models rerank those 100 results? In these tests, huge LLMs trained on more curated data like GPT-4o struggled. Its precision score was only 59.6%, the highest achieved by any model.
By
–
But how well could multimodal models rerank those 100 results? In these tests, huge LLMs trained on more curated data like GPT-4o struggled. Its precision score was only 59.6%, the highest achieved by any model.