4/5 The confusion: Scheduler saw "1 token allocated" and assumed decode. But 1 token can also mean "new request that hit token budget limits." Now fixed: check whether it's the request's first-ever token, not the allocation size. In vLLM v0.14.0 – update if running Mamba.
vLLM v0.14.0: Scheduler bug fix for Mamba token allocation
By
–
Leave a Reply