The key insight to getting computer use models to work was teaching the machine to count pixels – if it can identify exactly where an interaction needs to be, you can now get it to click, type, drag-n-drop, scroll with the computer. I think this will have broader implications
Pixel Counting: The Key to Computer Use AI Models
By
–