AI Dynamics

Global AI News Aggregator

Fastest BS=1 Inference with PyTorch Speculative Decoding

I think this breezes past as the fastest bs=1 inference I know.
With plain and simple PyTorch code.
Speculative Decoding isn't even applied, and will be purely additive…

→ View original post on X — @soumithchintala,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *