As LLMs get smarter, evals need to get harder.
OpenAI’s o1 has already maxed out most major benchmarks. Scale is partnering with CAIS to launch Humanity’s Last Exam: the toughest open-source benchmark for LLMs. We're putting up $500K in prizes for the best questions. (read on)
Scale and CAIS Launch Humanity’s Last Exam LLM Benchmark
By
–
