New agents benchmark: CollaborativeAgentBench is the first benchmark studying collaborative LLM agents that work with humans across multi-turn collaboration on realistic tasks in backend programming & frontend design
CollaborativeAgentBench: First Multi-Turn Human-Agent Collaboration Benchmark
By
–
Leave a Reply