What does every big company think about the agent harness? Anthropic, OpenAI, CrewAI, LangChain. They all build agents. They all wrap their models in infrastructure to make them useful. They each call it the harness. But they agree on one thing. And disagree on everything else. The agreement: the model is not the product. The infrastructure around the model is. The disagreement: how much of that infrastructure should exist. This is the most important architectural bet in AI right now. And each company is placing a different one. ๐๐ป๐๐ต๐ฟ๐ผ๐ฝ๐ถ๐ฐ bets on the model. Their harness is deliberately thin. A "dumb loop" that assembles the prompt, calls the model, executes tool calls, and repeats. The model makes all the decisions. The harness just manages turns. Their bet: as models get smarter, you need less infrastructure, not more. ๐ข๐ฝ๐ฒ๐ป๐๐ takes a similar but slightly thicker approach. Their Agents SDK is "code-first," meaning workflow logic lives in native Python, not in some graph DSL. But they add more structure: strict priority stacks for instructions, multiple orchestration modes, and explicit agent handoff patterns. ๐๐ฟ๐ฒ๐๐๐ adds a deterministic backbone. Their Flows layer handles routing and validation with hard-coded logic, while their Crews handle the autonomous parts. Intelligence where it matters, control everywhere else. ๐๐ฎ๐ป๐ด๐๐ฟ๐ฎ๐ฝ๐ต bets on explicit control. The harness encodes the logic. Every decision point is a node in a graph. Every transition is a defined edge. Planning steps, routing strategies, multi-step workflows are all spelled out in the harness, not left to the model. Notice the spectrum. On one end: trust the model, keep the harness thin. On the other: encode the logic, make the harness thick. And here's where it gets interesting. The scaffolding metaphor makes this concrete. Construction scaffolding is temporary infrastructure that lets workers reach floors they couldn't access otherwise. It doesn't do the building. But without it, workers can't reach the upper floors. The key word is temporary. As the building goes up, scaffolding comes down. Manus demonstrated this perfectly. They rebuilt their agent five times in six months. Each rewrite removed complexity. Complex tool definitions became simple shell commands. "Management agents" became basic handoffs. The scaffolding did its job. So they removed it. This is also why Anthropic regularly deletes planning steps from Claude Code's harness. Every time a new model version ships that can handle something internally, the corresponding harness logic gets stripped out. But there's a catch. Models are now trained with specific harnesses in the loop. Claude Code's model learned to use the exact scaffolding it was built with. Change the scaffolding, and performance drops. The worker trained on THIS scaffolding. Swap it out, and they stumble. So the field is converging on a principle: Build scaffolding that's designed to be removed. But remove it carefully, because the model learned to lean on it. The "future-proofing test" for any agent system: if dropping in a more powerful model improves performance without adding harness complexity, the design is sound. Two products using the exact same model can perform completely differently based on this one decision: how thick is the harness? LangChain changed only the infrastructure (same model, same weights) and jumped from outside the top 30 to rank 5 on TerminalBench 2.0. The model didn't improve. The scaffolding around it did. The article below is a deep dive on agent harness engineering, covering the orchestration loop, tools, memory, context management, and everything else that transforms a stateless LLM into a capable agent. Akshay ๐ (@akshay_pachaar) x.com/i/article/204073208484โฆ โ https://nitter.net/akshay_pachaar/status/2041146899319971922#m
โ View original post on X โ @akshay_pachaar, 2026-04-10 12:51 UTC