you should use it in the fastest OSS inference generation server out there: TGI (
https://
github.com/huggingface/te
xt-generation-inference
…) Olivier the maintainer has been optimizing it with all his secret knowledge (well not so secret since it's an open-source repository hahaha)
TGI: Fastest Open Source LLM Inference Generation Server
By
–
Leave a Reply