Before you even get to multi-GPU training with model parallel frameworks like #Deepspeed, you need to load the pretrained checkpoint into memory. To make matters worse for machines with multiple GPUs, you need to load the checkpoint into host memory once for each GPU in your job!
Loading Pretrained Checkpoints: Multi-GPU Memory Challenge
By
–
Leave a Reply