Using a custom RDMA-based networking library, we've been able to achieve 3200 Gbps GPU memory transfers, bypassing NCCL limits for 97.1% theoretical bandwidth efficiency. Our latest blog shares our journey of building a custom high-performance networking solution on AWS.
Custom RDMA Network Achieves 3200 Gbps GPU Memory Transfer Efficiency
By
–
