I don't know that RLHF would bias that kind of thing – my mental model is that annotators are shown two answers to the same prompt and asked which is "best", so if none of the test prompts happened to touch on the concept of a roadside kiosk that vocabulary wouldn't be affected
RLHF Annotation Bias and Model Vocabulary Development
By
–
Leave a Reply