Can your AI assistant actually do real work, or just answer questions? The InternLM team at Shanghai AI Lab presents WildClawBench, a new benchmark that ditches simple Q&A. It tests AI agents in a real computer environment (with a browser, terminal, and files) where they must
WildClawBench: Testing AI Agents in Real Computer Environments
By
–
Leave a Reply