You’re taking it too literally. I’m trying to point to a way to improve RL by recognizing that at a certain level of abstraction it’s doing something similar to backprop (namely, efficiently propagating later results back to earlier choices via dynamic programming).
Reinforcement Learning and Backpropagation: A Dynamic Programming Perspective
By
–
Leave a Reply