AI Dynamics

Global AI News Aggregator

About

Agent Evals Offline Tests Real World Success Correlation

Fair, it's all definitions 🙂 But evals in the sense that most product builders think of them (static offline envs/test sets you run an agent version against) tend not to correlate super well to real-world success

→ View original post on X — @mattshumer_,