I find it unimaginably based that the OAI Evals team keeps making benchmarks finding that Claude is better and publishing it anyway. they are 3 for 3 this year in acknowledging specifically how much Claude is better at tasks OAI care about. there is no sarcasm here folks. this
OpenAI Evals Team Publishes Benchmarks Showing Claude Superiority
By
–
Leave a Reply