AI Dynamics

Global AI News Aggregator

Learning Methods Unified Through Gradient Optimization Framework

Funny @sirbayes Learning methods — supervised, RLHF, policy gradients, Dagger, self-training — can be seen as optimisation with the following gradient: grad = Expectation_x,y [ F(x,y) grad log p(y|x) ] Choices of F and how x and y are produced determine the learning type 1/n

→ View original post on X — @nandodf,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *