AI Dynamics

Global AI News Aggregator

About

Evaluating Language Models on Real Unsolved Questions

7. Assessing Language Models on Unsolved Questions The paper introduces a new evaluation paradigm that tests models on real unsolved questions from the wild, rather than on fixed-answer exams.

→ View original post on X — @dair_ai