Agent-first benchmarks are something! Curious what the protocol looks like for agents that take wildly different numbers of steps.
By
–
Agent-first benchmarks are something! Curious what the protocol looks like for agents that take wildly different numbers of steps.