Compression is one way to fight the KV cache wall. The other is to not throw the KV away. LMCache reuses and offloads it to CPU, disk, or S3 instead of evicting. Already plugs into vLLM, SGLang, and NVIDIA Dynamo. Worth a star if you serve LLMs.
LMCache offloads KV cache to CPU, disk, or S3 instead of evicting
By
–