AI Dynamics

Global AI News Aggregator

Policy Gradients: Q-Functions and State-Action Value Learning

Policy gradients: F = Q(x,y) (the state-action value function), and x and y are generated by the model acting on an environment with policy p(y|x). The x’s are from the invariant state distribution as in the policy gradients theorem.

→ View original post on X — @nandodf,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *