GPU library performance can be very notchy — runtime of batched torch.linalg.solve_ex() went up by over 10x going from 511×511 matrices to 512×512.
GPU Matrix Computation Performance Drops at 512×512 Threshold
By
–
By
–
GPU library performance can be very notchy — runtime of batched torch.linalg.solve_ex() went up by over 10x going from 511×511 matrices to 512×512.