
this chart from Anthropic earned top spot in Wikipedia’s 'Most Deceptive Graphs' Hall of Fame 😁 Claude (@claudeai) In evals, Sonnet with an Opus advisor scored 2.7 percentage points higher on SWE-bench Multilingual than Sonnet alone, while costing 11.9% less per task. Community note: The graph is misleading due to a zoom-in on BOTH the X- and the Y-axis, making the difference look much bigger than it actually is. x.com/tombielecki/st… — https://nitter.net/claudeai/status/2042308627478773808#m
