Anyone know how to derive this '1.5x' communication overhead between FSDP vs DDP (from the FSDP paper)?
FSDP vs DDP Communication Overhead Derivation Explained
By
–
Global AI News Aggregator
By
–
Anyone know how to derive this '1.5x' communication overhead between FSDP vs DDP (from the FSDP paper)?
Leave a Reply