AI Dynamics

Global AI News Aggregator

Evals Disconnect From Real User Utility In AI

Often, evals are very disconnected from actual utility. For example, we had an eval for a while that measured 'writing style'. Basically, how well do we prevent AI slop in writing output? We maxed out the eval, put the model in prod, and users hated it.

→ View original post on X — @mattshumer_,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *