Do "Agent Skills" actually make your LLM agents perform better? Researchers from BenchFlow and a diverse team from multiple institutions present SkillBench, a rigorous benchmark of 86 tasks across 11 domains. It precisely measures how well 'Agent Skills'—structured procedural
SkillBench: Measuring LLM Agent Skills Performance Across 86 Tasks
By
–
Leave a Reply