For example, Gemma 2 2B IT answers correctly while the 9B version fails. I'd imagine that models optimized for @lmsysorg Chatbot Arena would perform worse, but that's not always the case.
Gemma 2 Model Size Performance Variations and Arena Optimization
By
–
