I think this breezes past as the fastest bs=1 inference I know.
With plain and simple PyTorch code.
Speculative Decoding isn't even applied, and will be purely additive…
Fastest BS=1 Inference with PyTorch Speculative Decoding
By
–
Global AI News Aggregator
By
–
I think this breezes past as the fastest bs=1 inference I know.
With plain and simple PyTorch code.
Speculative Decoding isn't even applied, and will be purely additive…
Leave a Reply