How well do LLM coding agents truly perform on complex, end-to-end software feature development? Researchers from the Institute of Automation, Chinese Academy of Sciences and Huawei Technologies Co., Ltd. introduce FeatureBench, a new benchmark using a scalable, test-driven
FeatureBench: Evaluating LLM Coding Agents on Complex Software Features
By
–
