As benchmarks continue to get saturated, it's great to see a no-frills benchmark of 387 challenging math problems: https://
github.com/protagolabs/od
yssey-math/tree/main
… GPT-4 is 66% on high-school subset, 42% on college subset, and only 11% on high-school competition subset.
New Math Benchmark: 387 Challenging Problems Tests GPT-4 Limits
By
–
Leave a Reply