Announcing LoRAX v0.2 Sparse SGMV: vectorize LoRA and base model requests in same batch Tensor Parallel SGMV: multi-GPU, multi-LoRA vectorized inference ExLlama v2 kernels for faster GPT-Q (thanks Florian Zimmermeister!)
…and more! https://
pbase.ai/3T2Nemt
LoRAX v0.2: Sparse SGMV and Tensor Parallel Inference
By
–
Leave a Reply