This would be sad if pre-training with feedback is actually better, because pre-training is by far the most expensive part of the training, and you wouldn't want to re-train from scratch every time you update your reward model.
Pre-training with Feedback: Computational Cost Concerns
By
–
Leave a Reply