But this is well below that. Let me put it another way: how can we improve RL by making it more like backprop? (And by RL I mean sequential decision making, not the current set of techniques for doing it.)
Improving Sequential Decision Making by Incorporating Backpropagation Principles
By
–
Leave a Reply