Speculative decoding speeds up generation from LLMs significantly by computing several potential tokens in parallel. Learn about this technique and how it has been utilized to achieve 2–3x speed-ups at inference: https://t.co/RLCOrYxZnv pic.twitter.com/9h5TMGtU02
— Google AI (@GoogleAI) 6 décembre 2024
Speculative decoding speeds up generation from LLMs significantly by computing several potential tokens in parallel. Learn about this technique and how it has been utilized to achieve 2–3x speed-ups at inference: https://
goo.gle/49npAHF