Sometimes I take for granted how quickly we can ship great product, vs how hard it is to tune a super high throughput inference + api stack. The scale makes the latter really hard. we’re working around the clock to make it better.
Scaling High Throughput AI Inference Infrastructure Challenges
By
–