AI Dynamics

Global AI News Aggregator

About

Clarification on AI Model Prefill and Decode Compute Phases

Small correction: prefill is compute-bound, decode is the memory-bandwidth-bound phase. But the KV cache point is spot on. It grows linearly with context, and every decode step has to read the whole thing.

→ View original post on X — @akshay_pachaar