perhaps bc lmsys performance is less a judgment on general capabilities than it is of rlhf choices i dug into the data a little for curiosity sake – always worth remembering how short (and single turn) most of these queries are.
LMSYS Performance Reflects RLHF Choices, Not General Capabilities
By
–
Leave a Reply