You can look at the raw training implementation here: https://
github.com/karpathy/llm.c
/blob/master/train_gpt2.c
… You'll see that we allocate all the required memory a single time in the beginning in one large block of 1D memory. From there on during training, no memory gets created or destroyed, so we stay at
Memory Allocation Strategy in LLM Training Implementation
By
–
Leave a Reply