7. Assessing Language Models on Unsolved Questions The paper introduces a new evaluation paradigm that tests models on real unsolved questions from the wild, rather than on fixed-answer exams.
Evaluating Language Models on Real Unsolved Questions
By
–
