Thanks for this research & for building practical measures of frontier model capabilities. We're focusing these days on the failure taxonomy you highlighted in the paper – testing how elements like structured programmatic plans and in-flow validators & reducers can help agents
AI21 Labs focuses on frontier model capabilities failure taxonomy and agent validation
By
–
Leave a Reply