Finally, we develop a benchmark testing for racial discrimination in LM decision-making in student course admissions. In our control condition (blue) we find more RLHF training produces model outputs that approach demographic parity but still discriminates against Black students.
RLHF Training Reduces but Doesn’t Eliminate Racial Discrimination in Admissions
By
–
Leave a Reply