It’s kind of a dilemma. You want to check independent evals because the original ones might be inflated for marketing purposes.
At the same time, independent evals may also undersell the LLM because of accidental bad prompting, bad batching, bad optimization etc
The Challenge of Evaluating LLMs: Balancing Marketing Claims and Independent Testing
By
–
Leave a Reply