KVCache quantization is a no-no as well I’d rather quantize the model to 2-bit rather than quantize the KVCache to 4-bit or even 8-bit
AI Educational Outreach: Lectures, Essays, Blogs, and Social Media
By
–
By
–
KVCache quantization is a no-no as well I’d rather quantize the model to 2-bit rather than quantize the KVCache to 4-bit or even 8-bit