Next time you pick benchmarks for model evaluation, check this correlation matrix from the MixEval paper. It's a great proxy for @lmsysorg Chatbot Arena performance, perfect for general-purpose chatbots. MixEval: https://
mixeval.github.io
MixEval Benchmark: Evaluating General-Purpose Chatbot Performance
By
–
Leave a Reply