"Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding" Speculative decoding for RL rollouts! This paper speeds up post-training without changing the target policy’s sampling distribution. So a draft model proposes multiple tokens, and the policy
Speculative Decoding Accelerates RL Post-Training Rollouts
By
–

Leave a Reply