While DAGGER is a great idea to enable Feedback for LLMs (eg chat) it is not a replacement for RL because RL opens up room for different forms of feedback (eg preferences). However, as a teacher I would advise careful measurement of the contribution of each to the final metric.
DAGGER vs RL: Feedback Methods for LLM Training
By
–
Leave a Reply