On GDPval, expert graders compared outputs from leading models to human expert work. Claude Opus 4.1 delivered the strongest results, with just under half of its outputs rated as good as or better than expert work. Just as striking is the pace of progress: OpenAI’s frontier
Claude Opus 4.1 Achieves Expert-Level Performance on GDPval
By
–
