Relatively similar. I think stage 3 with offloading and CPUAdam was even a tad better but I’d have to double check again on Wed when I am back at my computer. I usually use DeepSpeed but opted for FSDP here to reduce external dependencies.
Stage 3 Offloading and CPUAdam Performance Comparison with FSDP
By
–
Leave a Reply