AI Dynamics

Global AI News Aggregator

About

Scale and CAIS Launch Humanity’s Last Exam LLM Benchmark

As LLMs get smarter, evals need to get harder.
OpenAI’s o1 has already maxed out most major benchmarks. Scale is partnering with CAIS to launch Humanity’s Last Exam: the toughest open-source benchmark for LLMs. We're putting up $500K in prizes for the best questions. (read on)

→ View original post on X — @alexandr_wang,