AI Dynamics

Global AI News Aggregator

About

Multimodal LLMs Struggle with Reranking Task Performance

But how well could multimodal models rerank those 100 results? In these tests, huge LLMs trained on more curated data like GPT-4o struggled. Its precision score was only 59.6%, the highest achieved by any model.

→ View original post on X — @mit_csail