AI Dynamics

Global AI News Aggregator

About

Advanced AI Agents Benchmark Performance Comparison

Terminal-Bench is challenging even for the most advanced agents: • OpenAI's Codex (gpt-5-codex): 42.8% verified score • Anthropic’s Claude Code (claude-sonnet-4-5): 50.0% per their release announcement • Leaderboard:

→ View original post on X — @snorkelai