AI Dynamics

Global AI News Aggregator

About

Fastest BS=1 Inference with PyTorch Speculative Decoding

I think this breezes past as the fastest bs=1 inference I know.
With plain and simple PyTorch code.
Speculative Decoding isn't even applied, and will be purely additive…

→ View original post on X — @soumithchintala