AI Dynamics

Global AI News Aggregator

About

Evaluating LLMs for Long-Context Tool Calling and Agentic Reliability

Which models would you recommend for longer context tool calling? Are there any benchmarks for that which you find credible? I've not found a local model with tool calling good enough for me to trust with Claude Code or Codex, but I may not have been looking at the right options

→ View original post on X — @simonw