Proud that Snorkel contributed to this work as authors on evaluating agents on realistic, long-horizon terminal tasks—where even strong models struggle to reliably complete end-to-end workflows. Thanks @Mike_A_Merrill @alexgshaw @laudeinstitute @stanfordailab for the opportunity
Snorkel Advances Agent Evaluation for Long-Horizon Tasks
By
–
Leave a Reply