New blog post where I discuss what makes an language model evaluation successful, and the "seven sins" that make hinder an eval from gaining traction in the community: https://
jasonwei.net/blog/evals Had fun presenting this at Stanford's NLP Seminar yesterday!
Seven Sins of Language Model Evaluation: What Makes Evals Successful
By
–
