9). Theory, Analysis, and Best Practices for Sigmoid Self-Attention – proposes Flash-Sigmoid, a hardware-aware and memory-efficient implementation of sigmoid attention; it yields up to a 17% inference kernel speed-up over FlashAttention-2 on H100 GPUs; show that SigmoidAttn
Flash-Sigmoid: Hardware-Efficient Attention 17% Faster
By
–