AI Dynamics

Global AI News Aggregator

About

SageAttention: 8-Bit Quantization for Efficient Attention

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration https://
arxiv.org/abs/2410.02367 https://
github.com/thu-ml/SageAtt
ention
…… This work from Tsinghua University proposes SageAttention, a highly efficient and accurate quantization method for attention.

→ View original post on X — @jiqizhixin