Thanks for highlighting our team's paper π Key findings show attention-based token importance transfers well across models, enabling training-free prompt compression with ~90-100% performance retention and faster first-token latency.. Check it out π Natural Language Processing Papers (@HEI) Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models Shubhangi Upasani, Ravi Shanker Raju, Bo Li, Mengmeing Ji, John Long, Chen Wu, Urmish Thakker, Guangtao Wang arxiv.org/abs/2603.02631 [ππ.π²π»] β https://nitter.net/HEI/status/2029181798924660997#m
β View original post on X β @sambanovaai, 2026-04-01 21:33 UTC
