9/ Medusa – a framework for LLM inference acceleration using multiple decoding heads; substantially reduces the number of decoding steps and achieves over 2.2x speedup without compromising generation quality.
Medusa Framework Achieves 2.2x LLM Inference Speedup
By
–