AI Dynamics

Global AI News Aggregator

About

KV Cache Memory Overhead in Large Language Models Explained

1000 tokens of text compressed at 1.2 bits/char and 4 chars/token ≈ 600 bytes 1000 tokens of kvcache in LLAMA 70B in fp16 takes up 8192 dim x 80 layers x 2 x 16 bit ≈ 2.6 GB so the kv cache is 2.6 GB / 600 bytes = 4.4 million times larger than the input make this make sense

→ View original post on X — @jxmnop