AI Dynamics

Global AI News Aggregator

RLHF Training Reduces but Doesn’t Eliminate Racial Discrimination in Admissions

Finally, we develop a benchmark testing for racial discrimination in LM decision-making in student course admissions. In our control condition (blue) we find more RLHF training produces model outputs that approach demographic parity but still discriminates against Black students.

→ View original post on X — @anthropicai,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *