AI Dynamics

Global AI News Aggregator

Kernel optimization attempts yield minimal performance gains

added under kernel4 https://
github.com/karpathy/llm.c
/commit/cb791c4ef58d45d58e5af624b0ed41439ac7aeff

a bit surprised to only see ~1-2% out of it, which then washes out in training, as the layernorm is not a top-ranking time kernel. Also tried float4 and unrolling but that didn't improve it too much bleh

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *