This blog post is really cool: to understand everyday Transformers optimizations like KV cache, FlashAttention or PagedAttention: https://
astralord.github.io/posts/transfor
mer-inference-optimization-toolset/
… Image below is the interactive visualization for KB cache!
Everyday Transformers Optimizations: KV Cache, FlashAttention, PagedAttention
By
–
