Excited to open-source a new hallucinations eval called SimpleQA! For a while it felt like there was no great benchmark for factuality, and so we created an eval that was simple, reliable, and easy-to-use for researchers. Main features of SimpleQA: 1. Very simple setup: there
SimpleQA: New Open-Source Hallucinations Evaluation Benchmark
By
–
Leave a Reply