Measuring AI Ability to Complete Long Tasks This paper introduces a new metric, the 50%-task-completion time horizon, to measure AI's ability to complete long tasks. It tracks the time it takes for AI models to match the 50% success rate in tasks that humans typically complete.
New Metric Measures AI Long-Task Completion Ability
By
–
