3. Automatic evaluation is hard
One core challenge of evaluation is coming up with a guideline on what a good response is. For example, for skill fit assessment, the response: “You’re not a good fit” is correct, but not helpful. Originally, evaluation was ad-hoc. Everyone could
Automatic Evaluation Challenges in AI Response Assessment
By
–