Imitation with Dagger: In counterfactual learning F is typically the identity. The agent acting with policy p(y|x) determines the x’s as in RL, but humans (or other agents) provide corrections in the form of y’s. The new data is used for retraining.
Dagger Imitation Learning: Human Feedback for Agent Training
By
–
Leave a Reply