> "what hit most was that it got uploaded without coauthor permission" Sure, this was another issue. But wasn't the core issue the flawed evaluation setup the authors used? Using GPT-4 to score itself? And repeated prompting until the answer was correct, thus reaching 100%?
Critique des problèmes méthodologiques dans l’évaluation GPT-4
By
–
Leave a Reply