AI Dynamics

Global AI News Aggregator

Meta-Review of LLM Leaderboard Evaluation Methodologies

In the spirit of being very meta here. Here's my personal meta-review of all the leaderboard-ing methodologies. 1. I like the elo ranking based on chatbot arena from @lmsysorg 2. LM harness (e.g., zero-shot PIQA, Hellaswag etc) is the equivalent of "MNIST" for LLMs. Okay-ish

→ View original post on X — @yitayml,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *