AI Dynamics

Global AI News Aggregator

MMLU Scores Incomparable: Evaluation Implementation Details Matter

23/ …not at all comparable even if they're both called MMLU & evaluated on same dataset Takeaway? Evaluations are strongly tied to implementations–down to minute details. A mere indication of "MMLU score" gives almost no information about how you can compare these numbers

→ View original post on X — @thom_wolf,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *