AI Dynamics

Global AI News Aggregator

About

RL vs SL: Demystifying Reinforcement Learning’s Core Concept

Yeah exactly. I get triggered when RL is dressed up in its full rigorous math formalism because it's gate-keeping an essentially trivial core idea. SL:
a token sequence comes from some 3rd party source (e.g. human demonstration), and you just train on it. RL:
you first sample a

→ View original post on X — @karpathy