Not a dumb question at all. I think caching is the trickiest one here (obvious ones like kv-caching aside). Caching token embeddings of common words probably don’t really help much. And prompts are probably often diverse enough that caching those would be too expensive. Session
Token Caching Challenges in Large Language Models
By
–
Leave a Reply