Google's new KV-cache optimization broke the DRAM stocks, but how does it work? Let's take quick a look. "TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate" TurboQuant combines 2 ideas from 2 earlier lines of work: PolarQuant and Quantized
Google’s KV-Cache Optimization: TurboQuant Vector Quantization Explained
By
–
