"FASTER: Value-Guided Sampling for Fast RL" Instead of fully denoising many action candidates and picking the best one at the end, it learns a critic over the noise seed and selects the promising sample upfront. They showed that the advantage of best-of-N is already visible
FASTER: Value-Guided Sampling Improves Reinforcement Learning Efficiency
By
–
Leave a Reply