Can AI truly empower robots to invent their own solutions? George Jiayuan Gao, Tianyu Li, and colleagues from UPenn present VLMgineer. This framework leverages Vision Language Models (VLMs) to brainstorm initial tool designs and action plans. It then refines these ideas using evolutionary search in simulation, optimizing both the tool's geometry and how the robot uses it. VLMgineer consistently outperforms existing human-crafted tools and VLM-generated designs from human specifications across diverse, challenging everyday manipulation tasks, transforming complex robotics problems into straightforward executions. VLMgineer: Vision Language Models as Robotic Toolsmiths Project: vlmgineer.github.io Paper: arxiv.org/abs/2507.12644 Our report: mp.weixin.qq.com/s/FXdeQhAeq… 📬 #PapersAccepted by Jiqizhixin
→ View original post on X — @jiqizhixin, 2026-04-03 14:45 UTC
