If you find the Falcon too slow out-of-the-box in the transformers library, you should probably play with the super-fast production-grade OSS inference generation library TGI. Check it out here: https://
github.com/huggingface/te
xt-generation-inference
…
TGI: Fast Production-Grade Open Source LLM Inference Library
By
–
Leave a Reply