AI Dynamics

Global AI News Aggregator

About

LMCache offloads KV cache to CPU, disk, or S3 instead of evicting

Compression is one way to fight the KV cache wall. The other is to not throw the KV away. LMCache reuses and offloads it to CPU, disk, or S3 instead of evicting. Already plugs into vLLM, SGLang, and NVIDIA Dynamo. Worth a star if you serve LLMs.

→ View original post on X — @akshay_pachaar