One way of thinking about what AI will automate first is via the “description-execution gap”: how much harder is it to describe the task than to actually do it? Tasks with large description-execution gaps will be ripe for automation because it’s easy to create training data and the value of automating them is huge, even if execution is non-trivial: – Fixing grammar mistakes in a long piece of writing – Submitting receipts for reimbursement – Training a model that achieves performance of X on a standard evaluation benchmark – Building an app where the UI is easy to check but requires a lot of moving parts in the backend Description-execution gaps tend to be small when the task is high-context and not technically challenging. The value of automating these is by definition smaller, and it’s harder to create data for them. For example: – Data processing scripts where the code to process the data is shorter and more precise than a natural language description – Running an ablation study in a high-context codebase that trains specialized models – Editing a video in a specific style (often easier to edit the video yourself than to describe how each little edit should be done) – Buying chinese groceries for my mom (she has very specific items and amounts, it's easier for her to go herself than to describe to me exactly the item, how to select the best fruit, etc) A bit similar to the discriminator-generator gap, but not exactly the same. Some things, like editing a video in a specific style, can have a large discriminator-generator gap but small description-execution gap
→ View original post on X — @_jasonwei, 2025-06-18 19:24 UTC