AI Dynamics

Global AI News Aggregator

LLaMA 65B MMLU Benchmark Discrepancy Analysis

2/ For one evaluation, MMLU (
https://
arxiv.org/abs/2009.03300), the community was surprised that the leaderboard numbers for the top model, LLaMA 65B, were significantly lower than the numbers in the published LLaMa paper: a 30% difference! We dived in a rabbit hole to understand

→ View original post on X — @thom_wolf,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *