Cool new coding benchmark! I always love to see new evals out in the world. Note though that the testing here is on the June version of Sonnet, not the latest version, so technically not "current frontier models."
New Coding Benchmark Released, But Uses Older Sonnet Version
By
–
Leave a Reply