AI Dynamics

Global AI News Aggregator

CUDA kernels for GPT-2 forward pass implementation in llm.c

Okay I did a first quick pass of naive CUDA kernels for the forward pass of GPT-2 and pushed everything to one file in llm.c, Still only ~1000 lines of code: https://
github.com/karpathy/llm.c
/blob/master/train_gpt2.cu
… Current per iteration timings on my Lambda box <3 A100 40GB PCIe, B=4, T=1024:
– llm.c: 111ms

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *