Another research accepted to ICLR 2026 We explored a new way to shrink long prompts using smaller draft models from different model families, no retraining needed. Faster time to first token, with performance holding strong. Take a look @UrmishThakker
Research on Prompt Compression Using Draft Models Accepted to ICLR
By
–