I’d say using publicly available benchmarks and data are basically just sanity checks. As I mentioned mentioned in the conclusion, ideally you want/need to set up the eval with proprietary data that is related to your business problem, to make sure the performance is not due to
Proprietary Data Essential for Meaningful Model Evaluation Beyond Public Benchmarks
By
–
Leave a Reply