DAGGER is a form of counterfactual teaching as explained in https://
arxiv.org/abs/2110.10819 – Note that it is the student who always acts. The teacher only provides corrections, which are used to minimise the LLM loss directly. Note however that this imitation IS NOT supervised learning.
DAGGER Counterfactual Teaching Method for LLM Training
By
–
Leave a Reply