10/ AgentBoard – a benchmark with an open-source evaluation framework to perform analytical evaluation of LLM agents; assesses the capabilities and limitations of LLM agents and demystifies agent behaviors which leads to building stronger LLM agents.
AgentBoard: Benchmark Framework for LLM Agent Evaluation
By
–
