This is expected, though — the best models are undersized and overtrained because you expect to use them; you optimize for inference compute instead of just train
Undersized overtrained models optimize for inference compute
By
–
By
–
This is expected, though — the best models are undersized and overtrained because you expect to use them; you optimize for inference compute instead of just train