If you use an optimized #LLM training framework like https://
pbase.ai/3DHqnE5, you can get the host memory overhead back down to a more reasonable 7 * 4 = 28 GiB of host memory even when training on multiple GPUs.
Optimized LLM Training Framework Reduces Host Memory Overhead
By
–
Leave a Reply