I think the main issue with the chatbot arena is that it evaluates messages instead of conversations. Users interact with LLMs to solve specific problems. Ideally, evaluations should capture how well this problem is solved. → When o1 takes 2 minutes to give me an answer
Chatbot Arena Evaluation Flaw: Messages vs Problem-Solving
By
–
Leave a Reply