Totally agree. I was just talking recently about the need for closed evaluations and unknown benchmarks. Some kind of LLM comparison tests that can't so easily be gamed.
By
–
Totally agree. I was just talking recently about the need for closed evaluations and unknown benchmarks. Some kind of LLM comparison tests that can't so easily be gamed.