I think model size is hard to infer from speed & pricing: it depends on – GPU type
– precision and quantization at inference
– batch scaling
– subsidized or not
– MoE or not
– etc.
Inferring Model Size from Speed and Pricing is Complex
By
–
By
–
I think model size is hard to infer from speed & pricing: it depends on – GPU type
– precision and quantization at inference
– batch scaling
– subsidized or not
– MoE or not
– etc.