Model performance comparisons are tricky without knowing exact sizes; it could be an apples-to-oranges case. Plus, you can always push numbers with inference scaling. Normalizing by tokens/sec would be more meaningful.
Model Performance Comparisons: Size Normalization and Inference Scaling Challenges
By
–