The first AI that improves without retraining. (it rewrites its own agent harness) Every developer I know has one thing in common: they obsess over their setup. The terminal, the scripts, the shortcuts. They don't just write code. They constantly refine how they work. The code gets better because the environment gets better. MiniMax just released M2.7, and I think the most interesting thing about it isn't a benchmark number. It's the fact that M2.7 improves its own agent harness. Autonomously. Let's break this down: When you run an AI agent today, it operates inside a "harness." Think of it as the agent's operating environment: the skills it can invoke, the tools it can call, its memory, and the rules it follows. Normally, a human engineer builds this harness, and the agent operates within it. The harness stays fixed. M2.7 treats its harness as something it can rewrite. Here's what the loop looks like: – The agent runs a task and analyzes where things went wrong – It plans changes to its own scaffold: skills, MCPs, memory – It applies those changes, runs evaluations against a benchmark – It compares the results and decides whether to keep or revert – It writes self-criticism into memory so the next round starts smarter Then it loops back and does it again. And again. Think of it like a developer who finishes a project, writes a retrospective, restructures their workflow based on what they learned, and shows up the next day with a better setup. Except the developer here is the model itself. MiniMax ran this self-optimization loop for over 100 rounds internally. Along the way, the model discovered things on its own: it systematically searched for optimal sampling parameters (temperature, penalties), wrote workflow-specific guidelines for itself (like automatically checking for the same bug pattern in other files after a fix), and even added loop detection to avoid getting stuck. No human had to tell it to do any of this. They also tested this in a more controlled setting. They had M2.7 compete in 22 ML competitions from OpenAI's MLE Bench Lite. Each trial ran for 24 hours, fully autonomous. After each iteration, the agent wrote a memory file and performed self-criticism, feeding those insights into the next round. With every round, the ML models it trained achieved higher medal rates. The best run earned 9 gold medals. I've summarized the self-evolving architecture in the graphic below. The reason I find this compelling: this isn't about making a smarter model. It's about making a model that makes itself smarter. The weights never change. What changes is the system around it: better skills, better memory, better workflow rules. And that distinction matters because it means the improvement loop can run continuously without any retraining. We're entering a phase where agents don't just follow instructions. They redesign their own playbook. If you want to learn more, I've shared a link to their official blog post in the next tweet.
→ View original post on X — @akshay_pachaar, 2026-03-31 13:07 UTC