Today we release ScreenSuite, the most comprehensive evaluation suite for GUI agents (aka Computer Use agents). We packed 13 benchmarks, and 3 different environments, to evaluate the full range of agentic capabilities for vision models. And it turns out, @Alibaba_Qwen models are
ScreenSuite: Comprehensive Evaluation Suite for GUI Agents
By
–
