Next, we found that feature steering can indeed increase or decrease various forms of social biases in targeted ways. For example, dialing up the "Gender bias awareness" feature significantly increased the gender bias scores in our evaluations.
Feature Steering Controls Social Bias in AI Models
By
–
