3/ It turns out the LLaMA team used the original evaluation code proposed by the authors of the MMLU benchmark (find it at https://
github.com/hendrycks/test) Let's call it the "original implementation"
LLaMA Team Uses Original MMLU Benchmark Evaluation Code
By
–
Leave a Reply