SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference paper page: https://
huggingface.co/papers/2307.02
628
… Autoregressive large language models (LLMs) have made remarkable progress in various natural language generation tasks. However, they incur high
SkipDecode: Efficient LLM Inference with Autoregressive Skip Decoding
By
–
