Introducing MLE-bench, a new benchmark from @OpenAI to evaluate the performance of AI agents across 75 ML engineering tasks. Excited to have the authors discussing their work here! @junshernchan @thelokasiffers @JaffeOliver @jjamesaung @danesherbs @evanon0ping @ChowdhuryNeil
OpenAI Launches MLE-Bench for ML Engineering AI Agent Evaluation
By
–
