Everyone is scaling compute. That’s not the bottleneck anymore. AI performance is now a data movement problem. Every token =
read → fetch → compute → repeat And that loop is inefficient on traditional GPUs (like NVIDIA Blackwell B200). More FLOPs won’t fix it. Dataflow
AI Performance Is Now a Data Movement Problem Not Compute
By
–