UQ: Assessing Language Models on Unsolved Questions This research assessed LLMs on actual unsolved questions & not benchmarks, with today’s best model only able to solve 10/500 questions!
By
–

UQ: Assessing Language Models on Unsolved Questions This research assessed LLMs on actual unsolved questions & not benchmarks, with today’s best model only able to solve 10/500 questions!