2/ For one evaluation, MMLU (
https://
arxiv.org/abs/2009.03300), the community was surprised that the leaderboard numbers for the top model, LLaMA 65B, were significantly lower than the numbers in the published LLaMa paper: a 30% difference! We dived in a rabbit hole to understand
LLaMA 65B MMLU Benchmark Discrepancy Analysis
By
–
Leave a Reply