Yes, cuBLASLt for gemms, cuDNN for flash attention
The fp32 version will become more educational and will delete these dependencies. The "mainline" version we just want to be really fast, so we're less discriminating. cuBLASLt I think is ~ok dep, but cuDNN turned out surprisingly
cuBLASLt and cuDNN Dependencies for Optimized Performance
By
–
Leave a Reply