Contrived problem, I know, but: ChatGPT 4o, o1-mini, and Claude 3.5 Sonnet all get this wrong — 0 out of 3 each ChatGPT o1 gets it right 3 out of 3
LLM comparison: which models solved the contrived problem?
By
–

By
–

Contrived problem, I know, but: ChatGPT 4o, o1-mini, and Claude 3.5 Sonnet all get this wrong — 0 out of 3 each ChatGPT o1 gets it right 3 out of 3