AI Dynamics

Global AI News Aggregator

Dagger Imitation Learning: Human Feedback for Agent Training

Imitation with Dagger: In counterfactual learning F is typically the identity. The agent acting with policy p(y|x) determines the x’s as in RL, but humans (or other agents) provide corrections in the form of y’s. The new data is used for retraining.

→ View original post on X — @nandodf,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *