AI Dynamics

Global AI News Aggregator

LLM Training: RL vs SFT with In-Context Learning

"Don’t be difficult. I mean this is obvious." Sutton is right ofc. The analogue in LLM land to what humans do is something along the lines of: Given this math problem AND human example solution in the context, solve the problem. Reward of 1 if correct. It's not SFT, it's RL.

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *