Fascinating new LLM benchmark alert! My colleagues @DCasBol and @thefillm have developed a simple yet powerful test exposing LLMs' struggles with sequential reasoning. Key findings:
• Most LLMs start erring after just 2 operations • OpenAI o1-mini unsurprisingly
LLM Benchmark Reveals Sequential Reasoning Limitations
By
–