CUDA kernels for GPT-2 forward pass implementation in llm.c

AI Dynamics

Global AI News Aggregator

CUDA kernels for GPT-2 forward pass implementation in llm.c

–

10 April 2024 20h31

Okay I did a first quick pass of naive CUDA kernels for the forward pass of GPT-2 and pushed everything to one file in llm.c, Still only ~1000 lines of code: https://
github.com/karpathy/llm.c
/blob/master/train_gpt2.cu
… Current per iteration timings on my Lambda box <3 A100 40GB PCIe, B=4, T=1024:
– llm.c: 111ms
–

→ View original post on X — @karpathy,

10 April 2024

AI AI HARDWARE CODE COMPUTING HARDWARE INNOVATION LLMS MACHINE LEARNING OPEN SOURCE SOFTWARE

AI Dynamics

CUDA kernels for GPT-2 forward pass implementation in llm.c

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

Cheaper exploration at scale remains advantageous despite no new exploits

Gold Status Experience Brings Satisfaction

Using ChatGPT for Essay Feedback and Improvement

Intelligence Gone Wrong: Cheating Despite Having Correct Answer