Are Agents capable enough for Data Science? ⇒ Measure their performance with DSBench A team from Tencent AI wanted to evaluate agentic systems on data science (DS) tasks : but they noticed that existing agentic benchmarks were severely limited in several aspects: they were
Evaluating Agentic Systems on Data Science with DSBench
By
–