Interesting launch! If the agent is good at "agentic search that doesn't stop until it finds what you need", consider evaluating on our BrowseComp benchmark, which measures just that! SimpleQA mainly targets models that don't browse: https://
openai.com/index/browseco
mp/
…
BrowseComp Benchmark for Evaluating Agent Search Capabilities
By
–