The ModelParallel scheme above scales to arbitrary model sizes and numbers of devices on a single host, but it also works with multi-host training. If you do multi-host training, a common gotcha is that you need to remember to seed everything in the same way on each host (else
ModelParallel Scaling: Multi-Host Training Setup and Seeding Requirements
By
–
Leave a Reply