Community: Eval for LLMs are broken! Academic benchmarks are not representative of real world performance! . We need better evals! Also the same community: Lets make definitive rankings & leaderboards based on just four zero-shot "LM harness" tasks! Not wanting to single
LLM Evaluation Methods: Academic Benchmarks vs Real-World Performance
By
–
Leave a Reply