1/ FlashAttention-3 – proposes to adapt FlashAttention to take advantage of modern hardware; the techniques used to speed up attention on modern GPUs include producer-consumer asynchrony, interleaving block-wise matmul and softmax operations, and block quantization and incoherent
FlashAttention-3 Optimizes GPU Attention Operations for Modern Hardware
By
–
Leave a Reply