Even though they fine-tuned the model, since the data is very less, still the model is not accurate. Getting more data will solve this but human annotation is slow and expensive. So they come up with another model which is called Reward Model(RM). 6/9
Reward Model as Solution for Data Scarcity in Fine-tuned Models
By
–
Leave a Reply