Our new benchmark dropped this week and it’s already exposing where even top LLMs struggle. Top score: 51.9%. Test your agent (or just try a task) https://
huggingface.co/datasets/snork
elai/agent-finance-reasoning
…
New Finance Reasoning Benchmark Reveals LLM Performance Gaps
By
–
Leave a Reply