AI Dynamics

Global AI News Aggregator

MultiChallenge: New Multi-Turn LLM Benchmark Released by Scale AI

Introducing MultiChallenge by @scale_AI – a new multi-turn conversation benchmark. Current frontier LLMs score under 50% accuracy (top: 44.93%). o1
Claude 3.5 Sonnet
Gemini 2.0 Pro Experimental Paper: http://
arxiv.org/abs/2501.17399
Leaderboard: http://
scale.com/leaderboard/mu
ltichallenge

→ View original post on X — @alexandr_wang,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *