The current science of evaluating AI models, such as primarily relying on benchmarks, is far from optimal. @Nature today a new scalable way used to assess 15 LLMs with absolute demand scales, enhancing predictor power and expandability nature.com/articles/s41586-0…
→ View original post on X — @erictopol, 2026-04-01 15:13 UTC
