This is a very interesting analysis from Epoch. Apparently Anthropic's models are still superior to reasoning models like o1 in terms of mathematical tasks – something that surprises me completely. I would have given o1 better mathematical results than Sonnet 3.5, especially in
Anthropic Models Outperform o1 in Mathematical Tasks Analysis
By
–
