AI Dynamics

Global AI News Aggregator

LoRAX v0.2: Sparse SGMV and Tensor Parallel Inference

Announcing LoRAX v0.2 Sparse SGMV: vectorize LoRA and base model requests in same batch Tensor Parallel SGMV: multi-GPU, multi-LoRA vectorized inference ExLlama v2 kernels for faster GPT-Q (thanks Florian Zimmermeister!)
…and more! https://
pbase.ai/3T2Nemt

→ View original post on X — @predibase,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *