OK, I will ponder it! Currently: it's only one turn, about 20-30 tool calls (est.) depending whether you include file reads, and networking is definitely not the bottleneck. But yeah, load/inference speed is punished — but that's real-life! I think Kimi K2.5 might have
Model Inference Performance and Tool Call Optimization Analysis
By
–