Nice, a new and super fast LLM inference engine! TokenSpeed is a speed-of-light LLM inference engine designed for agentic workloads, with TensorRT-LLM-level performance and vLLM-level usability. Project: https://
github.com/lightseekorg/t
okenspeed
…
TokenSpeed: A High-Performance LLM Inference Engine for Agentic Workloads
By
–
