Some people are misreading this — 511×511 was FASTER. It looks like at 512×512 and above it falls to another path that requires internal CudaMalloc/Free calls.
CUDA Memory Optimization: 511×512 vs 512×512 Performance Analysis
By
–
Global AI News Aggregator
By
–
Some people are misreading this — 511×511 was FASTER. It looks like at 512×512 and above it falls to another path that requires internal CudaMalloc/Free calls.
Leave a Reply