1/6 Long-horizon agentic tasks are breaking our mental models. More tokens, bigger models, and best-of-N only go so far. Orchestrated approach to test-time-compute scaling is what long-horizon tasks need. Here’s what we learned using SWE-bench as a test case. Read the blog for
Orchestrated Test-Time Compute Scaling for Long-Horizon Agentic Tasks
By
–
Leave a Reply