AI Dynamics

Global AI News Aggregator

About

SimpleQA: New Open-Source Hallucinations Evaluation Benchmark

Excited to open-source a new hallucinations eval called SimpleQA! For a while it felt like there was no great benchmark for factuality, and so we created an eval that was simple, reliable, and easy-to-use for researchers. Main features of SimpleQA: 1. Very simple setup: there

→ View original post on X — @_jasonwei