Yeah exactly. I get triggered when RL is dressed up in its full rigorous math formalism because it's gate-keeping an essentially trivial core idea. SL:
a token sequence comes from some 3rd party source (e.g. human demonstration), and you just train on it. RL:
you first sample a
RL vs SL: Demystifying Reinforcement Learning’s Core Concept
By
–
Leave a Reply