AI Dynamics

Global AI News Aggregator

About

GPT-2 Activation Memory and GPU Cache Analysis

Makes sense, in GPT-2 (124M) case we're currently doing B=4, T=1024, C=768 => 3M activations @ float32 => 12MB. A100 L2 cache is 40MB, and even L1, at 192KB/SM with 108 SMs => ~= 20MB (wow, that's more than I expected). The pleasures of smaller networks and caches…

→ View original post on X — @karpathy,