"Kwai Summary Attention Technical Report" This paper uses learnable summary tokens for long-context attention. It splits text into chunks, compresses each chunk into a summary token, keeps recent text dense within a sliding local window, and reads distant context through
Kwai Summary Attention: Learnable Tokens for Long Context
By
–
