AI Dynamics

Global AI News Aggregator

About

Fighting Physics and NVIDIA Compiler Stack for Kernel Optimization

Great read! My experience is that you’re fighting physics but also the nvidia compiler and the stack overall, and even after pulling *a lot* of tricks we still can’t achieve more than ~80-90% mem bw on many kernels that you’d naively think should be ~100. And the rabbit hole

→ View original post on X — @karpathy