Yes; heavy tool use. It seems to have mostly solved it via code (PIL, cv2), but using multimodal intuition to debug. E.g. one attempt within the CoT generates a path that simply traverses the outside of the maze, but it recognizes on its own this is wrong so it refines the code.
Multimodal AI debugging code to solve a maze
By
–