AI Dynamics

Global AI News Aggregator

About

Token Caching Challenges in Large Language Models

Not a dumb question at all. I think caching is the trickiest one here (obvious ones like kv-caching aside). Caching token embeddings of common words probably don’t really help much. And prompts are probably often diverse enough that caching those would be too expensive. Session

→ View original post on X — @rasbt