AI Dynamics

Global AI News Aggregator

About

Production Performance vs Academic Benchmarks Eval Correlation

What production performance are you referring to then? Or are you referring to academic benchmarks being bad in general? Because you would always need some eval benchmark ideally correlated to the prod use case.

→ View original post on X — @yitayml